Created on 2010-07-22.06:12:24 by abuiles, last changed 2011-05-10.22:07:23 by darcswatch. Tracked on DarcsWatch.
File name |
Status |
Uploaded |
Type |
Edit |
Remove |
-resolve-issue-1599_-automatically-expire-unused-caches.dpatch
|
|
abuiles,
2010-07-28.13:47:46
|
text/x-darcs-patch |
|
|
resolve-issue-1599_-automatically-expire-unused-caches.dpatch
|
|
abuiles,
2010-07-22.06:12:24
|
text/x-darcs-patch |
|
|
resolve-issue-1599_-automatically-expire-unused-caches.dpatch
|
|
abuiles,
2010-07-28.20:55:19
|
text/x-darcs-patch |
|
|
resolve-issue-1599_-automatically-expire-unused-caches.dpatch
|
|
abuiles,
2010-08-03.14:32:49
|
text/x-darcs-patch |
|
|
resolve-issue-1599_-automatically-expire-unused-caches.dpatch
|
|
abuiles,
2010-08-06.06:41:02
|
text/x-darcs-patch |
|
|
resolve-issue-1599_-automatically-expire-unused-caches.dpatch
|
|
abuiles,
2010-08-07.00:41:37
|
text/x-darcs-patch |
|
|
unnamed
|
|
abuiles,
2010-07-22.06:12:24
|
|
|
|
unnamed
|
|
abuiles,
2010-07-28.13:47:46
|
|
|
|
unnamed
|
|
abuiles,
2010-07-28.20:55:19
|
|
|
|
unnamed
|
|
abuiles,
2010-08-03.14:32:49
|
|
|
|
unnamed
|
|
abuiles,
2010-08-06.06:41:02
|
|
|
|
unnamed
|
|
abuiles,
2010-08-07.00:41:37
|
|
|
|
See mailing list archives
for discussion on individual patches.
msg11815 (view) |
Author: abuiles |
Date: 2010-07-22.06:12:24 |
|
1 patch for repository http://darcs.net:
Wed Jul 21 22:29:45 COT 2010 builes.adolfo@googlemail.com
* Resolve issue 1599: automatically expire unused caches
Attachments
|
msg11825 (view) |
Author: kowey |
Date: 2010-07-22.15:47:56 |
|
On Thu, Jul 22, 2010 at 06:12:24 +0000, Adolfo Builes wrote:
> Wed Jul 21 22:29:45 COT 2010 builes.adolfo@googlemail.com
> * Resolve issue 1599: automatically expire unused caches
Resolve issue 1599: automatically expire unused caches
------------------------------------------------------
> builes.adolfo@googlemail.com**20100722032945
> Ignore-this: 38455634b5648ed7555380336ea74464
> ] hunk ./src/Darcs/Global.hs 30
> whenDebugMode, withDebugMode, setDebugMode,
> debugMessage, debugFail, putTiming,
> addCRCWarning, getCRCWarnings, resetCRCWarnings,
> - darcsdir
> + addBadCache, getBadCacheList, isBadCache, darcsdir,
> + isTrustedHttp, addTrustedHttp
> hunk ./src/Darcs/Global.hs 73
> Just actions <- swapMVar atexitActions Nothing
> -- from now on atexit will not register new actions
> mapM_ runAction actions
> + badCaches <- getBadCacheList
> + when ( not $ null badCaches ) $ reportBadCaches badCaches
OK, so after running all of the atExitActions, we print out our
accumulated warnings about caches (should we worry about what
happens if one of the atexit actions fails)?
> -
> + reportBadCaches caches = do
> + if length caches > 1
> + then
> + hPutStrLn stderr "\nI could not reach the following repositories (listed in _darcs/prefs/sources):"
> + else
> + hPutStrLn stderr "\nI could not reach the following repository (listed in _darcs/prefs/sources):"
For what it's worth, this could be more simply expressed using the
English module. It lets you write something like
"I could not reach the following"
++ englishNum (length caches) (Noun "repository") "(list in in ...)"
It looks like you'd have to modify Noun to account for this.
I'll submit a patch.
It may also be a good idea to give the user some advice about what
to do here.
> + mapM_ (hPutStrLn stderr) caches
> + hPutStrLn stderr ""
Alternatively, hPutStrLn (unlines caches ++ "\n")
> +type BadCacheList = [String]
> +{- NOINLINE _badCacheList -}
> +_badCacheList :: IORef BadCacheList
> +_badCacheList = unsafePerformIO $ newIORef []
> +
> +addBadCache :: String -> IO ()
> +addBadCache cache = modifyIORef _badCacheList (cache:)
> +
> +getBadCacheList :: IO [String]
> +getBadCacheList = readIORef _badCacheList
> +
> +{- NOINLINE isBadCache -}
> +isBadCache :: String -> Bool
> +isBadCache cache = unsafePerformIO $ do badCaches <- getBadCacheList
> + return (cache `elem` badCaches)
Is it really necessary for this to be String -> Bool?
Why not String -> IO Bool?
Alternatively, why not IO (String -> Bool)?
I'm also a bit sceptical about the need for a type synonym.
Maybe just passing around [String] is clearer here.
> +type TrustedHttps = [String]
> +{- NOINLINE _trustedHttpList -}
> +_trustedHttpList :: IORef TrustedHttps
> +_trustedHttpList = unsafePerformIO $ newIORef []
So you maintain a list of URLs which you managed to reach the
first time you contacted them this session.
'Trust' is probably not the right notion to use here (since I'd
be thinking more in terms of security or something like that).
How about "reachable" instead?
> +addTrustedHttp :: String -> IO ()
> +addTrustedHttp http = modifyIORef _trustedHttpList (http:)
> +
> +getTrustedHttps :: IO TrustedHttps
> +getTrustedHttps = readIORef _trustedHttpList
> +
> +{- NOINLINE isTrustedHttp -}
> +isTrustedHttp :: String -> Bool
> +isTrustedHttp http = unsafePerformIO $ do trustedHttps <- getTrustedHttps
> + return (http `elem` trustedHttps)
Same complaint about possibly unnecessary unsafePerformIO. I mean, we
certainly use it to achieve tracking of some global information in
Darcs, but it's something we should be be somewhat reluctant to do lest
Darcs surprise us in the future.
> +-- | Returns the sources of a given cache.
> +cacheSource :: CacheLoc -> String
> +cacheSource (Cache _ _ s) = s
Potentially useful helper function. Seems like we could alternatively
modify the definition of Cache to use the curly brace syntax.
> fetchFileUsingCache :: Cache -> HashedDir -> String -> IO (String, B.ByteString)
> -fetchFileUsingCache = fetchFileUsingCachePrivate Anywhere
> +fetchFileUsingCache c h s = do result <- fetchFileUsingCachePrivate Anywhere c h s
> + return result
This change does not seem necessary. What was wrong with the
eta-reduced version of this code?
> sfuc [] _ = return ()
> - sfuc (c:cs) out | not $ writable c =
> + sfuc (c:cs) out | (not $ badCache c) && (not $ writable c) =
> - then speculateFileOrUrl (fn c) out
> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> do
> + addToBadCaches (show e) c)
> - else copyFileOrUrl [] (fn c) out Cachable
> + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> do
> + addToBadCaches (show e) c)
OK, so if something goes wrong when speculating on or copying a file,
(but it's not just the user hitting ^-c), we note the cache down as
being problematic.
>
> +-- | Checks if a given cache needs to be added to the list of bad caches.
> +-- It receives an error caught during execution and the cache.
> +-- For a local cache, if the given source doesn't exist anymore, it is added.
> +-- For HTTP sources if the error is timeout, it is added, if not we check if the source
> +-- still exist, if doesn't exist we added to the list of bad caches.
> +-- For SSH we check if the server is reachable.
> +-- The entries which get added to the cache are no longer tried for the rest of the command.
> +addToBadCaches :: String -> CacheLoc -> IO ()
Thanks for haddocking this.
It seems like addToBadCaches is may be a bit of misnomer first this
function primarily tests if a cache is reachable, and second because it
adds the cache entry to either the white or the blacklist. But I'm not
sure; maybe I misunderstand the code. After all, you do only call it
when something goes wrong...
Style point:
The code below is very heavily indented, which makes it harder to read
and review without a wide screen or window. It's a tricky trade-off
between too much and not enough, so I don't really have any answers.
Perhaps this style could help here
addToBadCaches e cache
| condition 1 =
...
| condition 2 =
...
Note the retracted guards and also starting each case on a new line
> +addToBadCaches e cache | isFile (cacheSource cache) = case "No such file or directory" `isInfixOf` (map toLower e) of
> + True -> do
> + exist <- doesDirectoryExist $ cacheSource $ cache
> + when (not exist) $ addBadCache . cacheSource $ cache
> + False -> return ()
1. I'm not sure what you gain from pattern-matching on the
Bool when you could just use an if-then-else.
2. Do we not have any exception types to work with here? Matching
on the string seems really fragile.
3. Case in point, you're matching "Foo" against (map toLower "foo")
which means that this code path is never visited.
> + | isUrl (cacheSource cache) = case "timeout" `isInfixOf` (map toLower e) of
> + True -> addBadCache . cacheSource $ cache
> + False -> case isTrustedHttp (cacheSource cache) of
> + True -> return ()
> + False -> do
> + let url = if last (cacheSource cache) == '/'
> + then cacheSource cache
> + else (cacheSource cache) ++ "/"
> + rsp <- simpleHTTP (getRequest url )
> + do case rsp of
> + Left _ -> return ()
> + Right response -> if rspCode response /= (2,0,0)
> + then addBadCache . cacheSource $ cache
> + else addTrustedHttp (cacheSource cache) >> return ()
OK earlier we see that this function is called if we receive some
exception when fetching over HTTP.
If the exception is not a timeout, we test if the URL is reachable and
whitelist it if so (the whitelist avoids future testing). But why do we
do this? And doesn't that confuse Darcs into thinking it successfully
fetched a file when it did not? Have you tested the actual case where
you get an exception but not a timeout?
Also, are there not web server configurations where
http://example.com/foo gives you a 404, but
http://example.com/foo/_darcs/foo gives you a result? What happens in
those cases? In other words, does the test of trying to fetch
http://example.com/foo/ actually make sense?
Also, why is it important to normalise the URL to have a trailing slash?
Is it just to avoid duplicate entries in the whitelist?
Code tidiness:
1. It could be useful to look into reducing the amount of indentation
you use. It's a tricky trade-off between too much and not enough,
so I don't really have any answers. Perhaps this style could help here
addToBadCaches e cache
| condition 1 =
...
| condition 2 =
...
It's all about making the code as easy to read as possible.
2. It seems like you should just name a source = cacheSource cache,
which I think will make the code a lot simpler looking
3. Why does addTrustedHttp need to be followed by return () if it already
returns ()?
> + | isSsh (cacheSource cache) = checkSshError (cacheSource cache) (getHttpUrl (cacheSource cache))
One principle I saw off some blog post somewhere is that when you have
a single unit of code, the code in that unit should tend to reside on
the same level of abstraction.
So a function like
if foo
then do low-level-A1
low-level-A2
low-level-A3
else high-level-B
is a bit odd, when you could instead write something like
if foo
then do low-level-A1
low-level-A2
low-level-A3
else do low-level-B1
low-level-B2
OR alternatively
if foo
then do high-level-A
else do high-level-B
But not a mix. I think why this is useful for readability is that it
makes it very clear that A and B are on the same sort of level of
operation. Anyway, don't take this sort of advice /too/ seriously.
It's one of these fuzzy things that I suspect you develop a feel for
over time.
In this particular case, you've chosen to break checkSshError into
a separate function, but not the HTTP stuff. Why?
> + | otherwise = fail $ "unknown transport protocol for: " ++ (cacheSource cache)
> +-- | Helper function to check reachability of a ssh source.
> +checkSshError :: String -> String -> IO ()
> +checkSshError source url = do
> + simpleHTTP (getRequest url )
> + `catchNonSignal` (\e -> do
> + case "timeout" `isInfixOf` (map toLower (show e)) of
> + True -> (addBadCache source) >> return (Left (ErrorMisc "timeout"))
> + False -> return () >> return (Left (ErrorMisc "Unkown error")))
> + return ()
> +-- | Given a SSH source it returns the server address appended with HTTP://
> +-- i.e: getHttpUrl "user@darcs.net:repo" returns "http://darcs.net"
> +getHttpUrl :: String -> String
> +getHttpUrl source = "http://" ++ ((\(_:xs) -> takeWhile (/=':') xs) . dropWhile (/='@') $ source)
Why does checkSshError take both the source and url argument? Is there
a case where url cannot be systematically derived from the source?
What happens to the Either type that you return in your handler?
Doesn't it get discarded?
Why are we going about things in this roundabout way of checking the SSH
path by converting the source name to an HTTP URI and fetching it over
HTTP? What about cases where people have a working SSH server but not
necessarily an HTTP server on a given machine?
> where ffuc (c:cs)
> - | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) =
> + | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) && (not $ badCache c) =
This seems ok although incidentally,
not (badCache c)
is probably simpler than
(not $ badCache c)
> - `catchall` ffuc cs
> + `catchNonSignal` (\e -> do
> + addToBadCaches (show e) c
> + ffuc cs )
> - `catchall` do (fname,x) <- ffuc cs
> - do createCache c subdir
> - createLink fname (fn c)
> - return (fn c, x)
> - `catchall`
> - do gzWriteFilePS (fn c) x `catchall` return ()
> - return (fname,x)
> + `catchNonSignal` (\ e -> do addToBadCaches (show e) c ; (fname,x) <- ffuc cs
> + do createCache c subdir
> + createLink fname (fn c)
> + return (fn c, x)
> + `catchall`
> + do gzWriteFilePS (fn c) x `catchall` return ()
> + return (fname,x))
This block of code actually has very few changes in it, but the fact
would have been more apparent if you tried to avoid the re-indentation
(not always possible).
In this case, you could have
- `catchall` do (fname,x) <- ffuc cs
- do createCache c subdir
- createLink fname (fn c)
...
+ `catchNonSignal` (\ e ->
+ do addToBadCaches (show e) c
+ (fname,x) <- ffuc cs
+ do createCache c subdir
I also took the liberty of simplifying away the semicolon.
> +-- | Checks if a given cache is in the list of bad caches.
> +badCache :: CacheLoc -> Bool
> +badCache = isBadCache . cacheSource
Do you really use this?
> import Data.Maybe ( isJust, catMaybes )
> import Control.Monad ( msum )
> - dff <- fetchFilePS (repo ++ "/" ++ df) Cachable `catchall` return B.empty
> + (timeOut,dff) <- (fetchFilePS (repo ++ "/" ++ df) Cachable >>= \ r -> return (False,r))
> + `catchNonSignal` (\ e -> return (isTimeOut e,B.empty))
This may be more readable using do notation
(timeOut,dff) <- do r <- fetchFilePS (repo ++ "/" ++ df) Cachable
return (False,r)
`catchNonSignal` (\ e -> return (isTimeOut e,B.empty))
The same sort of worry as above. What happens if your exception is not
a timeout? I'm concerned that you succeed here, where what you really
want to do maybe is re-throw the exception.
> -- below is a workaround for servers that don't return a 404 on nonexistent files
> hunk ./src/Darcs/Repository/Format.hs 57
> - rf <- if B.null dff || isJust (BC.elemIndex '<' dff)
[snip]
> - else return $ Right $ parseRepoFormat dff
> + rf <- case timeOut of
> + False ->
> + if B.null dff || isJust (BC.elemIndex '<' dff)
[snip indentation change]
> + else return $ Right $ parseRepoFormat dff
> + True -> return $ Left $ "Failed establishing a connection with "++ repo ++ " Timeout was reached"
> (\e -> return (Left (prettyException e)))
> + isTimeOut e = "timeout" `isInfixOf` map toLower (show e)
It seems like the notion of whether something is a timeout or not (you
check this by looking at the exception text) could be refactored.
I guess it's not a big deal if it is though.
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11828 (view) |
Author: mornfall |
Date: 2010-07-22.18:08:14 |
|
Eric Kow <kowey@darcs.net> writes:
> If the exception is not a timeout, we test if the URL is reachable and
> whitelist it if so (the whitelist avoids future testing). But why do we
> do this? And doesn't that confuse Darcs into thinking it successfully
> fetched a file when it did not? Have you tested the actual case where
> you get an exception but not a timeout?
I believe that we do this because a 404 could be quite legitimate: the
repository could be partially lazy but still have a lot of things we
want to use. We don't want to blacklist a source for this reason
alone. We also don't want to inspect a source each time it gives a 404,
since this is not for free, either.
> Also, are there not web server configurations where
> http://example.com/foo gives you a 404, but
> http://example.com/foo/_darcs/foo gives you a result? What happens in
> those cases? In other words, does the test of trying to fetch
> http://example.com/foo/ actually make sense?
>
> Also, why is it important to normalise the URL to have a trailing slash?
> Is it just to avoid duplicate entries in the whitelist?
I suspect that there are webservers that treat foo/ and foo
differently. But I am not saying one is better than the other. Anyway, I
think the right thing to look for is foo/_darcs/hashed_inventory (we
shouldn't be using caches for non-hashed repositories).
I wholeheartedly agree with the indentation issue Eric has pointed out,
too. Please try to fit your code in 80-90 columns if possible, although
100 is acceptable and 110 is stretching it but could be forgiven if
there's a good reason. : - ) Code that is wider is usually hard to read
for almost everyone (or so I guess).
Yours,
Petr.
|
msg11831 (view) |
Author: abuiles |
Date: 2010-07-23.01:35:57 |
|
Hi Eric,
>
> OK, so after running all of the atExitActions, we print out our
> accumulated warnings about caches (should we worry about what
> happens if one of the atexit actions fails)?
>
Yes, we should care about it, I will fix it.
>
>> +type BadCacheList = [String]
>> +{- NOINLINE _badCacheList -}
>> +_badCacheList :: IORef BadCacheList
>> +_badCacheList = unsafePerformIO $ newIORef []
>> +
>> +addBadCache :: String -> IO ()
>> +addBadCache cache = modifyIORef _badCacheList (cache:)
>> +
>> +getBadCacheList :: IO [String]
>> +getBadCacheList = readIORef _badCacheList
>> +
>> +{- NOINLINE isBadCache -}
>> +isBadCache :: String -> Bool
>> +isBadCache cache = unsafePerformIO $ do badCaches <- getBadCacheList
>> + return (cache `elem` badCaches)
>
> Is it really necessary for this to be String -> Bool?
> Why not String -> IO Bool?
>
> Alternatively, why not IO (String -> Bool)?
>
I used (String -> Bool) basically to escape the IO so that I could
use it easily in the guards.
> I'm also a bit sceptical about the need for a type synonym.
> Maybe just passing around [String] is clearer here.
>
The synonym was to give a better insight of what we were getting, but
indeed, it can be inferred from the function name.
>> +type TrustedHttps = [String]
>> +{- NOINLINE _trustedHttpList -}
>> +_trustedHttpList :: IORef TrustedHttps
>> +_trustedHttpList = unsafePerformIO $ newIORef []
>
> So you maintain a list of URLs which you managed to reach the
> first time you contacted them this session.
>
> 'Trust' is probably not the right notion to use here (since I'd
> be thinking more in terms of security or something like that).
> How about "reachable" instead?
>
"reachable" or "reliable" could be the words to express what I wanted.
>> +addTrustedHttp :: String -> IO ()
>> +addTrustedHttp http = modifyIORef _trustedHttpList (http:)
>> +
>> +getTrustedHttps :: IO TrustedHttps
>> +getTrustedHttps = readIORef _trustedHttpList
>> +
>> +{- NOINLINE isTrustedHttp -}
>> +isTrustedHttp :: String -> Bool
>> +isTrustedHttp http = unsafePerformIO $ do trustedHttps <- getTrustedHttps
>> + return (http `elem` trustedHttps)
>
> Same complaint about possibly unnecessary unsafePerformIO. I mean, we
> certainly use it to achieve tracking of some global information in
> Darcs, but it's something we should be be somewhat reluctant to do lest
> Darcs surprise us in the future.
I totally agree that I abused of unsafePerformIO, in this case I used
for the same reason I gave before.
>> +-- | Returns the sources of a given cache.
>> +cacheSource :: CacheLoc -> String
>> +cacheSource (Cache _ _ s) = s
>
> Potentially useful helper function. Seems like we could alternatively
> modify the definition of Cache to use the curly brace syntax.
>
I would totally prefer curly brace syntax, do we have any reason for
not using it ?
>> fetchFileUsingCache :: Cache -> HashedDir -> String -> IO (String, B.ByteString)
>> -fetchFileUsingCache = fetchFileUsingCachePrivate Anywhere
>> +fetchFileUsingCache c h s = do result <- fetchFileUsingCachePrivate Anywhere c h s
>> + return result
>
> This change does not seem necessary. What was wrong with the
> eta-reduced version of this code?
>
There is nothing wrong with it, what happened was that I was using
something else in the middle, and then took it away, I forgot to
rewrite it.
>> sfuc [] _ = return ()
>> - sfuc (c:cs) out | not $ writable c =
>> + sfuc (c:cs) out | (not $ badCache c) && (not $ writable c) =
>
>> - then speculateFileOrUrl (fn c) out
>> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> do
>> + addToBadCaches (show e) c)
>
>> - else copyFileOrUrl [] (fn c) out Cachable
>> + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> do
>> + addToBadCaches (show e) c)
>
> OK, so if something goes wrong when speculating on or copying a file,
> (but it's not just the user hitting ^-c), we note the cache down as
> being problematic.
>
If we determinate that the cause of the error is one that we think is
problematic, we added.
>
> It seems like addToBadCaches is may be a bit of misnomer first this
> function primarily tests if a cache is reachable, and second because it
> adds the cache entry to either the white or the blacklist. But I'm not
> sure; maybe I misunderstand the code. After all, you do only call it
> when something goes wrong...
>
I use the whitelist basically to don't test again if the repo is
reachable or not. ( Given the fact that I can get an error just
because certain file is not in that repo, but the repo does exist )
> Style point:
> The code below is very heavily indented, which makes it harder to read
> and review without a wide screen or window. It's a tricky trade-off
> between too much and not enough, so I don't really have any answers.
> Perhaps this style could help here
>
> addToBadCaches e cache
> | condition 1 =
> ...
> | condition 2 =
> ...
>
Just a stupid I idea I had in my mind that I would have to do things in this way
addToBadCaches e cache
| condition 1 =
...
| condition 2 =
....
>
> 2. Do we not have any exception types to work with here? Matching
> on the string seems really fragile.
>
I agree with you, I tried to work with the exception, unfortunately
"fromException" was giving nothing to this error ( I will look at it
again though).
> 3. Case in point, you're matching "Foo" against (map toLower "foo")
> which means that this code path is never visited.
I used specifically for "timeout" since the error could show "Timeout"
or "timeout".
>> + | isUrl (cacheSource cache) = case "timeout" `isInfixOf` (map toLower e) of
>> + True -> addBadCache . cacheSource $ cache
>> + False -> case isTrustedHttp (cacheSource cache) of
>> + True -> return ()
>> + False -> do
>> + let url = if last (cacheSource cache) == '/'
>> + then cacheSource cache
>> + else (cacheSource cache) ++ "/"
>> + rsp <- simpleHTTP (getRequest url )
>> + do case rsp of
>> + Left _ -> return ()
>> + Right response -> if rspCode response /= (2,0,0)
>> + then addBadCache . cacheSource $ cache
>> + else addTrustedHttp (cacheSource cache) >> return ()
>
> OK earlier we see that this function is called if we receive some
> exception when fetching over HTTP.
>
> If the exception is not a timeout, we test if the URL is reachable and
> whitelist it if so (the whitelist avoids future testing). But why do we
> do this? And doesn't that confuse Darcs into thinking it successfully
> fetched a file when it did not? Have you tested the actual case where
> you get an exception but not a timeout?
>
I did test it, and the reasons I add it to the white list, is to avoid
inspecting again ( as Petr said).
> Also, are there not web server configurations where
> http://example.com/foo gives you a 404, but
> http://example.com/foo/_darcs/foo gives you a result? What happens in
> those cases? In other words, does the test of trying to fetch
> http://example.com/foo/ actually make sense?
>
> Also, why is it important to normalise the URL to have a trailing slash?
> Is it just to avoid duplicate entries in the whitelist?
>
I added the trailing slash basically because of the reason Petr said,
cause some servers treats foo/ and foo differently, but as Petr
suggest what I should test is if foo/_darcs/hashed_inventory exist.
> Code tidiness:
>
> 1. It could be useful to look into reducing the amount of indentation
> you use. It's a tricky trade-off between too much and not enough,
> so I don't really have any answers. Perhaps this style could help here
>
> addToBadCaches e cache
> | condition 1 =
> ...
> | condition 2 =
> ...
>
> It's all about making the code as easy to read as possible.
>
> 2. It seems like you should just name a source = cacheSource cache,
> which I think will make the code a lot simpler looking
>
I'll rewrite it thanks :).
>
> In this particular case, you've chosen to break checkSshError into
> a separate function, but not the HTTP stuff. Why?
>
I did it just because it was getting too long ( yes, I tried to keep
the columns <= 80, but I had that clumsy idea I mentioned before)
>
>> +-- | Given a SSH source it returns the server address appended with HTTP://
>> +-- i.e: getHttpUrl "user@darcs.net:repo" returns "http://darcs.net"
>> +getHttpUrl :: String -> String
>> +getHttpUrl source = "http://" ++ ((\(_:xs) -> takeWhile (/=':') xs) . dropWhile (/='@') $ source)
>
>
> Why does checkSshError take both the source and url argument? Is there
> a case where url cannot be systematically derived from the source?
>
none, if fact the url is infere from the source.
> What happens to the Either type that you return in your handler?
> Doesn't it get discarded?
>
Yes is discarded, but in fact that's wrong, I wasn't very happy with
it and I realized that it is wrong.
> Why are we going about things in this roundabout way of checking the SSH
> path by converting the source name to an HTTP URI and fetching it over
> HTTP? What about cases where people have a working SSH server but not
> necessarily an HTTP server on a given machine?
>
You are right here, and I realized today that this is not the way I
should test that the ssh is reachable. ( I was just assuming that
every server will give a 200 when fetching over HTTP, I forgot about
cases like the one you mentioned). I didn't used the error message in
this case, because with ssh we don't get an error like the one with
http (i.e "timeout reached"), I will look better how can I handle that
error, also in this patch checking for ssh I "wrongly" do it to see if
it was a timeout error, but I didn't check if it was because the repo
wasn't there any longer, I guess I would have to do something like
"existRemoteFileSsh" and check for foo/_darcs/hashed_inventory
>
> Do you really use this?
>> import Data.Maybe ( isJust, catMaybes )
>> import Control.Monad ( msum )
>
That's not mine. I just didn't clean that up, as I think that kind of
things should go in a different patch.
>
> (timeOut,dff) <- do r <- fetchFilePS (repo ++ "/" ++ df) Cachable
> return (False,r)
> `catchNonSignal` (\ e -> return (isTimeOut e,B.empty))
>
> The same sort of worry as above. What happens if your exception is not
> a timeout? I'm concerned that you succeed here, where what you really
> want to do maybe is re-throw the exception.
>
If the exception is not timeout that means that It could be a
possibility that "repo++"/"++darcsdir++"/inventory" exist, if not, we
throw the final error.
Now I have the doubt of how to go with the ssh server, I mean, how
could I check that the SSH server is reachable or that the
_darcs/hashed_inventory exist, a first idea come to my mind maybe with
ssh, but I'm not sure of portability.
Thanks !
--
Adolfo
|
msg11837 (view) |
Author: kowey |
Date: 2010-07-23.11:50:46 |
|
On Thu, Jul 22, 2010 at 20:38:35 -0500, Adolfo Builes wrote:
> > Is it really necessary for this to be String -> Bool?
> > Why not String -> IO Bool?
> >
> > Alternatively, why not IO (String -> Bool)?
> >
> I used (String -> Bool) basically to escape the IO so that I could
> use it easily in the guards.
...
> I totally agree that I abused of unsafePerformIO, in this case I used
> for the same reason I gave before.
I think an IO (String -> Bool) approach could maybe still work because
you can grab the (String -> Bool) function within IO, and then use it
within a guard.
You'll have to decide if minimising the unsafePerformIO is worth the
extra code complexity or not. I tend to say it is, but others may
disagree with me.
> >> +cacheSource (Cache _ _ s) = s
> >
> > Potentially useful helper function. Seems like we could alternatively
> > modify the definition of Cache to use the curly brace syntax.
> >
> I would totally prefer curly brace syntax, do we have any reason for
> not using it ?
I'd say go for it. I trust you mean something like
data Cache = Cache { ...
, cacheSource :: String }
> I use the whitelist basically to don't test again if the repo is
> reachable or not. ( Given the fact that I can get an error just
> because certain file is not in that repo, but the repo does exist )
OK, that makes some sense, that there may be a legitimate 404 on a
file that does not exist, but the repository is reachable. Also, would
it be necessary to worry about other HTTP status codes, such as the
redirect ones (301 and 302?). Is there a more robust way to do this?
[Being sort of an abstract, high-level kind of guy, working on Darcs
has been a wonderful way for me to learn how things work a bit more
on the lower level]
> I agree with you, I tried to work with the exception, unfortunately
> "fromException" was giving nothing to this error ( I will look at it
> again though).
Thanks. System.IO.Error may help.
> > 3. Case in point, you're matching "Foo" against (map toLower "foo")
> > which means that this code path is never visited.
> I used specifically for "timeout" since the error could show "Timeout"
> or "timeout".
Sorry, I was referring to this:
case "No such file or directory" `isInfixOf` (map toLower e) of
> > If the exception is not a timeout, we test if the URL is reachable and
> > whitelist it if so (the whitelist avoids future testing). But why do we
> > do this? And doesn't that confuse Darcs into thinking it successfully
> > fetched a file when it did not? Have you tested the actual case where
> > you get an exception but not a timeout?
> >
> I did test it, and the reasons I add it to the white list, is to avoid
> inspecting again ( as Petr said).
Do you test it automatically?
> I added the trailing slash basically because of the reason Petr said,
> cause some servers treats foo/ and foo differently, but as Petr
> suggest what I should test is if foo/_darcs/hashed_inventory exist.
Yes.
> > In this particular case, you've chosen to break checkSshError into
> > a separate function, but not the HTTP stuff. Why?
> >
> I did it just because it was getting too long ( yes, I tried to keep
> the columns <= 80, but I had that clumsy idea I mentioned before)
OK, well what I would suggest is to break the local and HTTP stuff out too if
you're going to break the SSH stuff out (now I may decide I was wrong after
looking at the results of this, because I don't yet have a good feel about
what is most readable)...
> I guess I would have to do something like
> "existRemoteFileSsh" and check for foo/_darcs/hashed_inventory
Seems fair enough.
> > Do you really use this?
> >> import Data.Maybe ( isJust, catMaybes )
> >> import Control.Monad ( msum )
> >
> That's not mine. I just didn't clean that up, as I think that kind of
> things should go in a different patch.
I was referring to badCache :: CacheLoc -> Bool
This is one of these subtle and very subjective issues that I don't really
have an answer for, but I have a feeling that there is sometimes a point
where you're better off redoing application or composition than writing a
helper function. An example of this would be a hypothetical
putStrLnError = hPutStrLn stderr
(In fact, I use this sort of thing in some of my own code). Seems like
there the cost of the extra helper function to think about is is higher
than that of just repeating 'hPutStrLn stderr'. Maybe somebody else has
clearer thinking on this tradeoff. These things aren't big deals, of course.
Maybe there is no right answer and there's no point talking about them.
> > The same sort of worry as above. What happens if your exception is not
> > a timeout? I'm concerned that you succeed here, where what you really
> > want to do maybe is re-throw the exception.
>
> If the exception is not timeout that means that It could be a
> possibility that "repo++"/"++darcsdir++"/inventory" exist, if not, we
> throw the final error.
OK, I have to admit I don't really understand the problem yet (I think Petr
does) so I can trust the two of you. What I was worried about was that you
were just absorbing the final error instead of throwing it. Are you sure
you're actually throwing it?
> Now I have the doubt of how to go with the ssh server, I mean, how
> could I check that the SSH server is reachable or that the
> _darcs/hashed_inventory exist, a first idea come to my mind maybe with
> ssh, but I'm not sure of portability.
Why not use the preExisting copyFileOrUrl (or was it fetch? not too
sure I remember the difference) functions?
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11839 (view) |
Author: ganesh |
Date: 2010-07-23.11:58:05 |
|
Eric Kow wrote:
> On Thu, Jul 22, 2010 at 20:38:35 -0500, Adolfo Builes wrote:
>>> Is it really necessary for this to be String -> Bool?
>>> Why not String -> IO Bool?
>>>
>>> Alternatively, why not IO (String -> Bool)?
>>>
>> I used (String -> Bool) basically to escape the IO so that I could
>> use it easily in the guards.
>
> ...
>
>> I totally agree that I abused of unsafePerformIO, in this case I used
>> for the same reason I gave before.
>
> I think an IO (String -> Bool) approach could maybe still work
> because you can grab the (String -> Bool) function within IO, and
> then use it within a guard.
>
> You'll have to decide if minimising the unsafePerformIO is worth the
> extra code complexity or not. I tend to say it is, but others may
> disagree with me.
I'm totally against use of unsafePerformIO simply to reduce local code complexity. Where we use it, it's to deal with _global_ issues like needing to thread state or log messages everywhere.
Ganesh
===============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
===============================================================================
|
msg11895 (view) |
Author: abuiles |
Date: 2010-07-28.13:47:46 |
|
1 patch for repository http://darcs.net:
Wed Jul 28 08:07:31 COT 2010 builes.adolfo@googlemail.com
* Resolve issue 1599: automatically expire unused caches
Attachments
|
msg11898 (view) |
Author: kowey |
Date: 2010-07-28.17:46:38 |
|
Just some quick English and Hlinting (not a proper review yet)
Resolve issue 1599: automatically expire unused caches
-------------------------------------------------------
> + hPutStrLn stderr $ "\nI could not reach the following " ++
> + englishNum (length sources) (Noun "repository") " (listed in _darcs/prefs/sources):"
> + hPutStrLn stderr $ (unlines sources) ++ "\nIt is recommendable to delete the " ++
> + englishNum (length sources) (Noun "entry") " listed aboved if not used"
How about this?
I could not reach the repositories:
foo
bar
baz
quux
If you're not using them, you should probably delete the
the corresponding entries from _darcs/prefs/sources
> +isBadSource :: IO (String -> Bool)
> +isBadSource = do badSources <- getBadSourcesList
> + return (`elem` badSources)
Thanks for getting rid of the needless unsafePerformIO
> +checkCacheReachability :: String -> CacheLoc -> IO ()
> +checkCacheReachability e cache
> + | isFile (source cache) = do
> + reachable <- isReachableSource
> + if not $ reachable (source cache)
> + then
> + do
> + exist <- doesDirectoryExist $ source cache
> + when (not exist) $ addBadSource $ source cache
> + when exist $ addReachableSource $ source cache
> + else
> + return ()
I think you want
unless (reachable (source cache)) $ do
You could also maybe refactor this a little bit with a helper function
that takes the (String -> Bool) function as an argument. Not sure if
that would be a good thing.
> + else if reachable $ source cache
> + then return ()
> + else
see above
> + | otherwise = fail $ "unknown transport protocol for: " ++ source cache
those spaces are bit superfluous
Overall, I think your code is looking a lot cleaner now!
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11899 (view) |
Author: abuiles |
Date: 2010-07-28.20:55:19 |
|
1 patch for repository http://darcs.net:
Wed Jul 28 15:37:43 COT 2010 builes.adolfo@googlemail.com
* Resolve issue 1599: automatically expire unused caches
Attachments
|
msg11905 (view) |
Author: kowey |
Date: 2010-08-01.18:33:09 |
|
Sorry, Adolfo,
I've been remiss in my reviewing duty.
Anyway, I've interspersed Eric-reads-the-code style comments with
comments to the patch author. The latter are highlighted with
REMARK
Resolve issue 1599: automatically expire unused caches
======================================================
> + badSources <- getBadSourcesList
> + when ( not $ null badSources ) $ reportBadSources badSources
So here we're modifying the withAtExit function directly to report on
the caches that we could not reach.
REMARK: An alternative approach may be to register this as an exit
action via atexit, which seems a bit cleaner/more modular to me. I
guess the component that would register this would be the main function
for darcs. On the other hand, I suppose one advantage of baking this
into withAtExit is that we control the order of execution; this fires
after all the actions have been run. Are there other reasons? If we
don't care about the order of execution so much, I'd lean more towards
the more modular approach
> + reportBadSources sources = do
> + hPutStderr $ "\nI could not reach the following " ++
> + englishNum (length sources) (Noun "repository") ":"
> + hPutStderr $ (unlines sources) ++ "If you're not using " ++ itThem ( length sources) ++
> + ", you should probably delete\nthe corresponding " ++
> + englishNum (length sources) (Noun "entry") " from _darcs/prefs/sources."
> + hPutStderr = hPutStrLn stderr
> + itThem num = case num > 1 of
> + False -> "it"
> + True -> "them"
REMARK: Nice touch on the pronoun. This could probably go into the
English module. You could even make a Pronoun type which implements the
Countable instance, which allows you to reuse englishNum.
> +{- NOINLINE _badSourcesList -}
> +_badSourcesList :: IORef [String]
> +_badSourcesList = unsafePerformIO $ newIORef []
> +
> +addBadSource :: String -> IO ()
> +addBadSource cache = modifyIORef _badSourcesList (cache:)
> +
> +getBadSourcesList :: IO [String]
> +getBadSourcesList = readIORef _badSourcesList
> +
> +isBadSource :: IO (String -> Bool)
> +isBadSource = do badSources <- getBadSourcesList
> + return (`elem` badSources)
Looks fine; similar for reachableSources
> -data CacheLoc = Cache !CacheType !WritableOrNot !String
> +data CacheLoc = Cache { cacheType:: !CacheType, writableOrNot:: !WritableOrNot, source:: !String }
REMARK: I tend to prefix names of field accessors, which makes the names
uglier but reduces the chances I'll need to qualify them later on, for
example.
data CacheLoc = Cache { cacheType:: !CacheType
, cacheWritable :: !WritableOrNot
, cacheSource:: !String }
copyFileUsingCache
------------------
> - sfuc cache stickItHere
> + badSource <- isBadSource
> + sfuc cache stickItHere badSource
> - sfuc [] _ = return ()
> - sfuc (c:cs) out | not $ writable c =
> + sfuc [] _ _ = return ()
> + sfuc (c:cs) out badSource | not (badSource (source c)) && not (writable c) =
> if oos == OnlySpeculate
This new version of sfuc makes use of the list of unreachable sources.
REMARK: Perhaps another approach you could take is to just apply a
filter on the cache, maybe changing it to
case filter (\c -> not (badSource c || writable c) cs of
[] -> return ()
(c:_) -> sfuc c stickItHere
Personally, I'd just drop the guards and use an if-then-else for clarity
> hunk ./src/Darcs/Repository/Cache.hs 222
> - then speculateFileOrUrl (fn c) out
> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e ->
> + checkCacheReachability (show e) c)
> - else copyFileOrUrl [] (fn c) out Cachable
> - | otherwise = sfuc cs out
> + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e ->
> + checkCacheReachability (show e) c)
OK, so the other change being that if something should go wrong when
speculating or trying to copy a file, you check to see if the cache
is alright.
> + | otherwise = sfuc cs out badSource
REMARK: The indentation for this otherwise case is now a bit off, which
makes the code harder to understand. But it would be a moot point if
you went with the filter approach.
checkCacheReachability
-----------------------
> +-- | Checks if a given cache entry is reachable or not.
> +-- It receives an error caught during execution and the cache entry.
> +-- For a local cache, if the given source doesn't exist anymore, it is added.
> +-- For HTTP sources if the error is timeout, it is added, if not we check for the
> +-- _darcs/hashed_inventory file, if doesn't exist it means we are pointing to a repository
> +-- which used to exist there, but had been moved.
> +-- For SSH if we get an error we try to get the file _darcs/hashed_inventory again, if it fails
> +-- we add the entry to the list of sources which are no reachables.
> +-- The entries which get added to the cache are no longer tried for the rest of the command.
> +checkCacheReachability :: String -> CacheLoc -> IO ()
> +checkCacheReachability e cache
> + | isFile (source cache) = do
> + reachable <- isReachableSource
> + unless (reachable (source cache)) $ do
> + exist <- doesDirectoryExist $ source cache
> + unless exist $ addBadSource $ source cache
> + when exist $ addReachableSource $ source cache
If a source has already been whitelisted, we don't need to check it;
otherwise, otherwise either whitelist or blacklist it depending on
whether the directory exists.
REMARK: I think you can share the source cache expression using a where
clause which will be common to all the guards.
REMARK: These last two cases look like they could be more simply
expressed using an if-then-else
if exist
then addReachableSource (source cache)
else addBadSource (source cache)
> + | isUrl (source cache) = do
> + reachable <- isReachableSource
> + let string = case dropWhile (/='(') e of
> + (_:xs) -> fst (break (==')') xs)
> + _ -> e
> + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of
> + [(ce,_)] -> Just ce
> + _ -> Nothing
REMARK: We're still parsing error strings here, which seems rather
fragile to me. Weren't we going to capture status codes from curl or
something? It's an improvement at least that the error code parsing is
in one place and not scattered throughout the darcs source.
> + if cerror /= Nothing
> + then addBadSource $ source cache
> + else
> + unless (reachable (source cache)) $
> + withTemp $ \tempout -> do
> + let f = source cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory"
> + copyFileOrUrl [] f tempout Cachable
> + addReachableSource $ source cache
> + `catchNonSignal` (\_ -> addBadSource $ source cache)
So the logic here is:
If we get a connection error of some sort, then we add the cache to the
blacklist, even if it was already previously whitelisted.
If it's not a connection error that we got, and we have not already
whitelisted the server, then we test for reachability by trying to
fetch the hashed_inventory file. If the server is indeed reachable
it is whitelisted so we don't bother checking again; otherwise, we
add it to the blacklist so we know not to fetch from it again.
REMARK: It seems like you have a situation where an item could be
added to the blacklist even though it's already whitelisted. Is
that OK? Could there be some sort of unpredictable behaviour when
an entry is in both lists? Perhaps one way to think about this -- not
necessarily the right way, mind you -- is that
* If an entry is in the whitelist, it can be added to the blacklist
(but then it's removed from the whitelist). This would apply to
cases where a server goes down in the middle of a fetch.
* If an entry is in the blacklist, and you try to add it to the
whitelist, we have an error (bug in darcs...)
REMARK: we're baking in the assumption that hashed repositories have a
hashed_inventory file... hopefully this sort of baking it won't bite us
in the long run. I guess there's not much we can do about this.
REMARK: You may find the isJust function to be useful.
> + | isSsh (source cache) =
> + withTemp $ \tempout -> do
> + let f = source cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory"
> + copyFileOrUrl [] f tempout Cachable
> + addReachableSource $ source cache
> + `catchNonSignal` (\_ -> addBadSource $ source cache)
REMARK: It looks here like you have some duplication with the HTTP code,
maybe you could refactor this into a helper function like checkRemote.
REMARK: You seem to have omitted the case where the cache entry is
already whitelisted. (Seems like that logic could go into checkRemote)
> + | otherwise = fail $ "unknown transport protocol for: " ++ source cache
fetchFileUsingCachePrivate
--------------------------
REMARK: What's the difference between copyFileUsingCache and
fetchFileUsingCachePrivate? I think it's useful if you know,
since you're modifying these two functions.
> - ffuc cache
> + badSource <- isBadSource
> + ffuc cache badSource
> - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir subdir)++
> + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir subdir ++
REMARK: This should be in a separate patch. (Don't worry too much about
this, just being super-picky about minimal patches)
> hunk ./src/Darcs/Repository/Cache.hs 286
> - where ffuc (c:cs)
> - | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) =
> + where ffuc (c:cs) badSource
> + | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) && not (badSource (source c)) =
REMARK: Similar remark as with copyFileUsingCache. Maybe we could just
filter out the bad sources and leave the ffuc logic intact.
> do debugMessage $ "In fetchFileUsingCachePrivate I'm going manually"
> debugMessage $ " getting "++f
> debugMessage $ " from " ++ fn c
> hunk ./src/Darcs/Repository/Cache.hs 299
> fail $ "Hash failure in " ++ fn c
> return (fn c, x')
> else return (fn c, x) -- FIXME: create links in caches
> - `catchall` ffuc cs
> + `catchNonSignal` (\e -> do
> + checkCacheReachability (show e) c
> + foo <- isBadSource
OK, so if something goes wrong during fetchFileUsingCachePrivate, we
need to checkCacheReachability again. This can modify the white and
black lists so we need to pass an updated isBadSource.
REMARK: wait a second, why does do we need to care about the updated
isBadSource? It's not like sources reappear in the list, do they?
REMARK: Actually naming your variables 'foo' is probably a bad idea
(particularly since foo tends to be used casually to refer to some
hypothetical variable, not an actual one)
> - `catchall` do (fname,x) <- ffuc cs
> - do createCache c subdir
> - createLink fname (fn c)
> - return (fn c, x)
> - `catchall`
> - do gzWriteFilePS (fn c) x `catchall` return ()
> - return (fname,x)
> - | otherwise = ffuc cs
> + `catchNonSignal` (\ e ->
> + do
> + checkCacheReachability (show e) c
> + foo <- isBadSource
> + (fname,x) <- ffuc cs foo
> + do createCache c subdir
> + createLink fname (fn c)
> + return (fn c, x)
> + `catchall`
> + do gzWriteFilePS (fn c) x `catchall` return ()
> + return (fname,x))
Just a similar change; if something goes wrong we update the black/white
lists accordingly and try from the subsequent cache.
REMARK: I'm still worried about this moving from catchall to
catchNonSignal... Are we relying on any of these handlers doing
something clever when the user kills darcs ?
> - where handler' se =
> + where handler' se = do
REMARK: this noise does not belong in the patch :-)
HTTP module
-----------
> +data ConnectionError = CouldntResolveHost |
> + CouldntConnectToServer |
> + OperationTimeout
> + deriving (Eq, Read, Show)
REMARK: I would avoid the contraction here and call it
CouldNotResolveHost, etc. I can't really say why; I guess
some part of me feels it's clearer that way, or some part of
me wants ADTs and haddocks to be fairly formal.
> -waitNextUrl :: IO (String, String)
> +waitNextUrl :: IO (String, String, Maybe ConnectionError)
REMARK: It may be worth thinking about just replacing that second String
with Maybe ConnectionError... (not sure how wise this is in practice)
> #ifdef HAVE_HTTP
>
> hunk ./src/HTTP.hs 96
> waitNextUrl = do
> (u, f) <- readIORef requestedUrl
> if null u
> - then return ("", "No URL requested")
> + then return ("", "No URL requested", Nothing)
> else do writeIORef requestedUrl ("", "")
> e <- (fetchUrl u >>= \s -> B.writeFile f (BC.pack s) >> return "") `catch` h
> hunk ./src/HTTP.hs 99
> - return (u, e)
> + let ce = case e of
> + "timeout" -> Just OperationTimeout
> + _ -> Nothing
It looks like this e string is always an error.
REMARK: I guess there's not much more than trust the strings here. Have
you tested this with the HTTP module?
URL module
----------
Phew, this seems a bit confusing. I think the URL module contains both
high-level code and curl-specific code which is #ifdef'ed out.
Seems like the low-level Curl code should be moved to its own module.
> +
> +
REMARK: another irrelevant change, grumble :-P
> hunk ./src/URL.hs 223
> let l = pipeLength st
> when (l > 0) $ do
> dbg "URL.waitNextUrl start"
> - (u, e) <- liftIO $ waitNextUrl'
> + (u, e, ce) <- liftIO $ waitNextUrl'
> let p = inProgress st
> new_st = st { inProgress = Map.delete u p
> , pipeLength = l - 1 }
> hunk ./src/URL.hs 238
> else case Map.lookup u p of
> Just (f, _, _) -> do
> removeFileMayNotExist (f++"-new_"++randomJunk st)
> - downloadComplete u e
> + case ce of
> + Just httpError -> downloadComplete u (show httpError)
> + Nothing -> downloadComplete u e
> debugMessage $ "URL.waitNextUrl failed: "++
> u++" "++f++" "++e
> Nothing -> bug $ "Another possible bug in URL.waitNextUrl: "++u++" "++e
This is the high-level waitNextUrl, extended to make use of high-level
errors or fall back to string type errors in our reporting.
REMARK: does this mean the two cannot co-exist? If not, the type
signature of waitNextUrl' could probably reflect somehow, maybe
using Either or another type like
HttpStatus = HttpOK
| HttpConnectionError ConnectionError
| HttpOtherError String
> - unless (null e) (debugFail $ "Failed to download URL "++u++": "++e)
> + unless (null e) $ do
> + debugMessage $ "Failed to download URL "++u++": "++e
> + fail e
REMARK: irrelevant change (I understand this may be a bit of a pain, but
any effort you could put into making your patches easy to understand
pays off)
darcs revert and darcs record are interactive... (although if you find
them a bit of a pain to use, there could a UI discussion there...)
> -waitNextUrl' :: IO (String, String)
> +waitNextUrl' :: IO (String, String, Maybe ConnectionError)
This looks like the curl version of waitNextUrl'
> waitNextUrl' = do
> e <- curl_wait_next_url >>= peekCString
> + ce <- if not (null e)
> + then do
> + errorNum <- curl_last_error_num
> + case errorNum of
> + 6 -> return $ Just CouldntResolveHost
> + 7 -> return $ Just CouldntConnectToServer
> + 29 -> return $ Just OperationTimeout
> + _ -> return Nothing
> + else
> + return Nothing
Hmm, I'll take this on faith.
REMARK: potential minor refactor
return $ case errorNum of
6 -> Just CouldntResolveHost
7 -> Just CouldntConnectToServer
29 -> Just OperationTimeout
_ -> return Nothing
> +foreign import ccall "hscurl.h curl_last_error_num"
> + curl_last_error_num :: IO CInt
Exciting; I keep avoiding the FFI for some reason.
hscurl.c
--------
> +static CURLcode errornum = 0;
>
> const char *curl_wait_next_url()
> {
> + errornum = -1;
> +
>
> - if (result != CURLE_OK)
> + if (result != CURLE_OK){
> + errornum = result;
> return curl_easy_strerror(result);
> + }
REMARK: Hmm, this sort of stateful way of doing things tends to make me
nervous (also, do we need to worry about what happens when you call
wait_next_url several times in parallel?). Could we not return a tuple
instead? I guess we'd have to find a way to deal with structs in the
FFI?
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11906 (view) |
Author: kowey |
Date: 2010-08-01.18:37:41 |
|
On Sun, Aug 01, 2010 at 18:33:11 +0000, Eric Kow wrote:
> REMARK: Hmm, this sort of stateful way of doing things tends to make me
> nervous (also, do we need to worry about what happens when you call
> wait_next_url several times in parallel?). Could we not return a tuple
> instead? I guess we'd have to find a way to deal with structs in the
> FFI?
I cut out too much context here.
I was referring to this change:
+const int *curl_last_error_num();
which is how we grab the error code from our last attempt to fetch a
URL. I was asking if there is a way we could somehow return this in the
fetch function. Solutions I can think of with my
non-C-programmer-limited brain (*) are some kind of struct or maybe
passing in the error code as a pointer... Maybe somebody wiser can
chime in.
(*) Yeah yeah, kids these days...
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11921 (view) |
Author: abuiles |
Date: 2010-08-03.05:50:38 |
|
Hi Eric,
I'm almost done write the modifications with the patch, I have a small
issue though, with:
>
> REMARK: An alternative approach may be to register this as an exit
> action via atexit, which seems a bit cleaner/more modular to me. I
> guess the component that would register this would be the main function
> for darcs. On the other hand, I suppose one advantage of baking this
> into withAtExit is that we control the order of execution; this fires
> after all the actions have been run. Are there other reasons? If we
> don't care about the order of execution so much, I'd lean more towards
> the more modular approach
>
I register the action using atexit, and it works, but happens that it
calls the action twice, so I end with the message repeated.
This is the part where I added the atexit
hunk ./src/darcs.hs 54
+ atexit reportBadSources
hunk ./src/darcs.hs 77
-
+ where
+ reportBadSources = do
+ sources <- getBadSourcesList
+ when ( not $ null sources ) $ do
+ hPutStderr $ "\nI could not reach the following " ++
+ englishNum (length sources) (Noun "repository") ":"
+ hPutStderr $ (unlines sources) ++ "If you're not using " ++
+ englishNum (length sources) Pronoun ", you should
probably delete\nthe corresponding " ++
+ englishNum (length sources) (Noun "entry") " from
_darcs/prefs/sources."
+ hPutStderr = hPutStrLn stderr
Maybe any idea what is going on and why the action gets called twice ?.
--
Adolfo
|
msg11923 (view) |
Author: kowey |
Date: 2010-08-03.10:25:32 |
|
On Tue, Aug 03, 2010 at 00:53:18 -0500, Adolfo Builes wrote:
> I register the action using atexit, and it works, but happens that it
> calls the action twice, so I end with the message repeated.
Perhaps it's being registered more than once? To debug this, you could
modify atexit to take a string along with an action and count up the
number of instances of each string.
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11926 (view) |
Author: abuiles |
Date: 2010-08-03.14:32:49 |
|
1 patch for repository http://darcs.net:
Tue Aug 3 09:15:16 COT 2010 builes.adolfo@googlemail.com
* Resolve issue 1599: automatically expire unused caches
Attachments
|
msg11975 (view) |
Author: kowey |
Date: 2010-08-05.13:37:59 |
|
Hi Adolfo,
OK! Only more more iteration with me. In my opinion, what we should
after your next revision (sorry! it's mostly nitpicks, and feel free to
ignore me if I'm saying things which are evidently stupid or just
subjective), we should send your bundle to Somebody Else to review for
correctness. I'm bound to have missed the forest for the trees
somewhere.
We've seen most of this code before, so I'll only comment on what I
think are the new bits (or any old bits that may need modifying still)
Resolve issue 1599: automatically expire unused caches
------------------------------------------------------
> - sfuc cache stickItHere
> + badSource <- isBadSource
> + let cacheFiltered = filter (\ cacheEntry -> not . badSource $ cacheSource cacheEntry ) cache
> + sfuc cacheFiltered stickItHere
FIXME: This is one place that point-free style could produce nicer code
> - sfuc (c:cs) out | not $ writable c =
>
> - | otherwise = sfuc cs out
> + sfuc (c:cs) out
> + | not (writable c) =
...
> + | otherwise = sfuc cs out
I won't really grumble too much here about the irrelevant change
since it seems to be partly a consequence of your other work.
Improving code style is a good thing; I just try to encourage it
in separate patches when I can.
> - then speculateFileOrUrl (fn c) out
> - else copyFileOrUrl [] (fn c) out Cachable
> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> checkCacheReachability (show e) c)
> + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> checkCacheReachability (show e) c)
[note, whitespace chopped for review]
FIXME: Please amend out the conflict with Petr's DefaultDarcsRepo
change.
Old news on catching the copy/speculate failure
> +-- | Checks if a given cache entry is reachable or not.
> +-- It receives an error caught during execution and the cache entry.
> +-- For a local cache, if the given source doesn't exist anymore, it is added.
> +-- For HTTP sources if the error is timeout, it is added, if not we check for the
> +-- _darcs/hashed_inventory file, if doesn't exist it means we are pointing to a repository
> +-- which used to exist there, but had been moved.
>
> +-- For SSH if we get an error we try to get the file _darcs/hashed_inventory again, if it fails
> +-- we add the entry to the list of sources which are no reachables.
> +-- The entries which get added to the cache are no longer tried for the rest of the command.
FIXME: The haddock above is a bit ambiguous. Added where? You could
probably say blacklisted and/or whitelisted. Also, making each of these
into bullet points could make it a bit easier to read.
> +checkCacheReachability :: String -> CacheLoc -> IO ()
> +checkCacheReachability e cache
> + | isUrl source = do
> + reachable <- isReachableSource
> + unless (reachable source) $ do
> + let string = case dropWhile (/='(') e of
> + (_:xs) -> fst (break (==')') xs)
> + _ -> e
> + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of
> + [(ce,_)] -> Just ce
> + _ -> Nothing
> + if isJust cerror
> + then addBadSource source
> + else do
> + checkHashedInventoryReachability cache
> + addReachableSource source
> + `catchNonSignal` (\_ -> addBadSource source)
...
> +-- | Checks if the _darcs/hashed_inventory exist and is reachable
> +checkHashedInventoryReachability :: CacheLoc -> IO ()
> +checkHashedInventoryReachability cache =
> + withTemp $ \tempout -> do
> + let f = cacheSource cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory"
> + copyFileOrUrl [] f tempout Cachable
FIXME: I see you've done a bit of refactoring. FIXME I think you could
make it a bit better. How about returning IO Bool instead? The idea is
that you'd move the exception handling into here and the code above
could just test a conditional.
Code clarity is the number one thing I tend to go after (unfortunately
this does not necessarily mean I'm any good at it). It's why I seem to
be quite insistent on minor details like that; I think they pay off in
the long run especially when working in a team.
> - ffuc cache
> - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir subdir)++
> + badSource <- isBadSource
> + let cacheFiltered = filter (\ cacheEntry -> not . badSource $ cacheSource cacheEntry ) cache
> + ffuc cacheFiltered
> + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir subdir ++
> " from sources:\n\n"++show (Ca cache))
FIXME: sounds like you could do another pointfree refactor here
By the way, notice how much a little higher-order function like filter
lets us do without changing very much code? Just compose some stuff
together, feed it to filter and you're done.
> - `catchall` ffuc cs
> + `catchNonSignal` (\e -> do
> + checkCacheReachability (show e) c
> + badSource <- isBadSource
> + let cacheFiltered = filter (\ cacheEntry -> not . badSource $ cacheSource cacheEntry ) cs
> + ffuc cacheFiltered)
When I complained about catchall and catchNonSignal you pointed out that
x `catchall` y
is actually just
x `catchNonSignal` const y
As an aside, this kinda makes me think that maybe catchall isn't
/really/ needed, but I could be wrong.
FIXME: you're using filter (not . badSource . cacheSource) quite a lot.
Maybe what you want is something like this
filterOutBadSources :: [Cache] -> IO [Cache]
filterOutBadSources cs = do
badSource <- isBadSource
return $ filter (not . badSource . cacheSource) cs
HTTP module
~~~~~~~~~~~
> -waitNextUrl' = do
> - e <- curl_wait_next_url >>= peekCString
> - u <- curl_last_url >>= peekCString
> - return (u, e)
> +waitNextUrl' =
> + alloca $ \ errorPointer -> do
> + e <- curl_wait_next_url errorPointer >>= peekCString
> + ce <- if not (null e)
> + then do
> + errorNum <- peek errorPointer
> + case errorNum of
> + 6 -> return $ Just CouldNotResolveHost
> + 7 -> return $ Just CouldNotConnectToServer
> + 29 -> return $ Just OperationTimeout
> + _ -> return Nothing
> + else
> + return Nothing
> + u <- curl_last_url >>= peekCString
> + return (u, e, ce)
FIXME: You should have a comment pointing us to some documentation
on the error codes.
As a chatty aside, I like bracket functions like alloca which
prevent forgetful people like me from introducing random memory
leaks for want of a free and also avoid bugs like me accidentally
freeing something I shouldn't. I think we have similar idioms in
Darcs.Lock. Sort of an inane comment on my part :-)
> foreign import ccall "hscurl.h curl_wait_next_url"
> - curl_wait_next_url :: IO CString
> + curl_wait_next_url :: Ptr CInt -> IO CString
Thanks for teaching me a little more FFI
> main = withAtexit $ withSignalsHandled $
> flip catch execExceptionHandler $
> handle (\(AssertionFailed e) -> bug e) $ do
> + atexit reportBadSources
I think this is much cleaner than modifying Darcs.Global.
> + reportBadSources = do
> + sources <- getBadSourcesList
> + when ( not $ null sources ) $ do
FIXME: Clearer as unless (null sources), I think
> + hPutStderr $ "\nI could not reach the following " ++
> + englishNum (length sources) (Noun "repository") ":"
> + hPutStderr $ (unlines sources) ++ "If you're not using " ++
> + englishNum (length sources) It ", you should probably delete\nthe corresponding " ++
> + englishNum (length sources) (Noun "entry") " from _darcs/prefs/sources."
Not sure if this really belongs in main, maybe in one of the Cache
modules? I'm not the best person to judge.
FIXME: Maybe worth rewriting this using only one hPutStrLn stderr
plus some string concatenation. Concat may also be helpful here.
Basically, just try to make the code as easy to read as possible.
It's a bit subjective though, but I bet there are things you can
do which are objectively nicer.
> + hPutStderr = hPutStrLn stderr
FIXME: Ahah, for all my carrying on about refactoring, I actually think
this is one of those nuanced cases where the code duplication of just
saying hPutStrLn stderr is clearer than asking people to keep track of
an extra name, particularly people who have to maintain the code.
Subtle, huh?! You duplicate code and Eric complains. You refactor code
and then he complains some more! Can't win...
On the other hand, maybe I'd refactor the length sources if I were
you.
> int error = curl_easy_getinfo(easy, CURLINFO_PRIVATE, (char **)&url_data);
> - if (error != CURLE_OK)
> + if (error != CURLE_OK){
The if (foo) bar; style always seems to me like a clarity error
wrt if (foo) { bar; }... Good thing your change necessitates getting
rid of it.
> + *errorCode = error;
> + }
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
|
msg11985 (view) |
Author: abuiles |
Date: 2010-08-06.06:41:02 |
|
1 patch for repository http://darcs.net:
Fri Aug 6 01:27:53 COT 2010 builes.adolfo@googlemail.com
* Resolve issue 1599: automatically expire unused caches
Attachments
|
msg11986 (view) |
Author: kowey |
Date: 2010-08-06.08:08:47 |
|
Hi Florent,
This patch bundle has been through a few kowey-lints. But I suspect
that in all my tweaking, I may have missed the forest for the trees.
May I hand this over to you? (let me know if you're still catching up
from holidays, or otherwise are not quite available yet)
|
msg11987 (view) |
Author: kowey |
Date: 2010-08-06.08:50:00 |
|
Oh, Petr says he's got this one
|
msg12009 (view) |
Author: mornfall |
Date: 2010-08-06.21:13:41 |
|
Hi,
since Eric requested a parallel review, let's go for it. Overall, I am
of the opinion that the patch could be pushed as it is, with a few
things fixed in a followup patch. I am pointing out some possible
stylistic cleanups below, and some refactoring opportunities. I'll be
happier if you pick up on those (be it as a followup or through
amendment), although I am not going to be terribly unhappy if the patch
stays as it is.
Yours,
Petr.
Resolve issue 1599: automatically expire unused caches
------------------------------------------------------
[export/import stuff]
> hunk ./src/Darcs/Global.hs 154
> resetCRCWarnings :: IO ()
> resetCRCWarnings = writeIORef _crcWarningList []
>
> +{- NOINLINE _badSourcesList -}
> +_badSourcesList :: IORef [String]
> +_badSourcesList = unsafePerformIO $ newIORef []
> +
> +addBadSource :: String -> IO ()
> +addBadSource cache = modifyIORef _badSourcesList (cache:)
> +
> +getBadSourcesList :: IO [String]
> +getBadSourcesList = readIORef _badSourcesList
> +
> +isBadSource :: IO (String -> Bool)
> +isBadSource = do badSources <- getBadSourcesList
> + return (`elem` badSources)
> +
> +{- NOINLINE _reachableSourcesList -}
> +_reachableSourcesList :: IORef [String]
> +_reachableSourcesList = unsafePerformIO $ newIORef []
> +
> +addReachableSource :: String -> IO ()
> +addReachableSource src = modifyIORef _reachableSourcesList (src:)
> +
> +getReachableSources :: IO [String]
> +getReachableSources = readIORef _reachableSourcesList
> +
> +isReachableSource :: IO (String -> Bool)
> +isReachableSource = do reachableSources <- getReachableSources
> + return (`elem` reachableSources)
> +
> darcsdir :: String
> darcsdir = "_darcs"
Some global bookkeeping. Doesn't make me very happy, but for now it
happens to be the best tradeoff we can get.
[import/export wibbles for Darcs.Repository]
[import/export bits of Darcs.Repository.Cache]
> hunk ./src/Darcs/Repository/Cache.hs 63
>
> data WritableOrNot = Writable | NotWritable deriving ( Show )
> data CacheType = Repo | Directory deriving ( Eq, Show )
> -data CacheLoc = Cache !CacheType !WritableOrNot !String
> +data CacheLoc = Cache { cacheType:: !CacheType, cacheWritable:: !WritableOrNot, cacheSource:: !String }
> newtype Cache = Ca [CacheLoc] -- abstract type for hiding cache
>
> instance Eq CacheLoc where
> hunk ./src/Darcs/Repository/Cache.hs 209
> do debugMessage $ "I'm doing copyFileUsingCache on "++(hashedDir subdir)++"/"++f
> Just stickItHere <- cacheLoc cache
> createDirectoryIfMissing False (reverse $ dropWhile (/='/') $ reverse stickItHere)
> - sfuc cache stickItHere
> + cacheFiltered <- filterBadSources cache
> + sfuc cacheFiltered stickItHere
Filter the cache. I'd probably write
filterBadSources cache >>= (flip sfuc) stickItHere
(unless you need the cacheFiltered identifier somewhere later, but I
don't see that)
This could go in as an extra patch, if you feel like changing it.
> `catchall` return ()
> where cacheLoc [] = return Nothing
> cacheLoc (c:cs) | not $ writable c = cacheLoc cs
> hunk ./src/Darcs/Repository/Cache.hs 221
> case othercache of Just x -> return $ Just x
> Nothing -> return $ Just (fn c)
> sfuc [] _ = return ()
> - sfuc (c:cs) out | not $ writable c =
> + sfuc (c:cs) out
> + | not (writable c) =
> if oos == OnlySpeculate
> hunk ./src/Darcs/Repository/Cache.hs 224
> - then speculateFileOrUrl (fn c) out
> - else copyFileOrUrl DefaultRemoteDarcs (fn c) out Cachable
> - | otherwise = sfuc cs out
> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> checkCacheReachability (show e) c)
> + else copyFileOrUrl DefaultRemoteDarcs (fn c) out Cachable `catchNonSignal` (\e -> checkCacheReachability (show e) c)
> + | otherwise = sfuc cs out
OK. Some more checking of source validity.
> fn c = hashedFilePath c subdir f
>
> copyFilesUsingCache :: OrOnlySpeculate -> Cache -> HashedDir -> [String] -> IO ()
> hunk ./src/Darcs/Repository/Cache.hs 236
>
> data FromWhere = LocalOnly | Anywhere deriving ( Eq )
>
> +-- | Checks if a given cache entry is reachable or not.
> +-- It receives an error caught during execution and the cache entry.
> +-- If the caches is not reachable it is blacklisted and not longer tried for
> +-- the rest of the session. If it is reachable it is whitelisted and future errors with such
> +-- cache get ignore.
> +-- To determine reachability:
> +-- * For a local cache, if the given source doesn't exist anymore, it is blacklisted.
> +-- * For remote sources if the error is timeout, it is blacklisted, if not,
> +-- it checks if _darcs/hashed_inventory exist, if it does, the entry is whitelisted, if
> +-- it doesn't, it is blacklisted.
> +checkCacheReachability :: String -> CacheLoc -> IO ()
> +checkCacheReachability e cache
> + | isFile source = do
> + reachable <- isReachableSource
> + unless (reachable source) $ do
> + exist <- doesDirectoryExist source
> + if exist
> + then
> + addReachableSource source
> + else
> + addBadSource source
> + | isUrl source = do
> + reachable <- isReachableSource
> + unless (reachable source) $ do
> + let string = case dropWhile (/='(') e of
> + (_:xs) -> fst (break (==')') xs)
> + _ -> e
> + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of
> + [(ce,_)] -> Just ce
> + _ -> Nothing
> + if isJust cerror
> + then addBadSource source
> + else checkFileReachability
> +
> + | isSsh source = do
> + reachable <- isReachableSource
> + unless (reachable source) checkFileReachability
> +
> + | otherwise = fail $ "unknown transport protocol for: " ++ source
> + where source = cacheSource cache
> + checkFileReachability = do
> + reachable <- checkHashedInventoryReachability cache
> + if reachable
> + then
> + addReachableSource source
> + else
> + addBadSource source
OK. I am not exactly excited about the abuse of String here, although
IIRC you have discussed that with Eric already. I won't dither on this
too much.
> +-- | Returns a list of reachables cache entries,
> +-- taking out the blacklisted entries
> +filterBadSources :: [CacheLoc] -> IO [CacheLoc]
> +filterBadSources cache = do
> + badSource <- isBadSource
> + return $ filter (not . badSource . cacheSource) cache
OK.
> +-- | Checks if the _darcs/hashed_inventory exist and is reachable
> +checkHashedInventoryReachability :: CacheLoc -> IO Bool
> +checkHashedInventoryReachability cache =
> + withTemp $ \tempout -> do
> + let f = cacheSource cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory"
> + copyFileOrUrl DefaultRemoteDarcs f tempout Cachable
> + return True
> + `catchNonSignal` (\_ -> return False)
You could use System.FilePath.Posix ( (</>) ) above (note the _Posix_
there). Other than that, OK.
> fetchFileUsingCachePrivate :: FromWhere -> Cache -> HashedDir -> String -> IO (String, B.ByteString)
> fetchFileUsingCachePrivate fromWhere (Ca cache) subdir f =
> do when (fromWhere == Anywhere) $ copyFileUsingCache ActuallyCopy (Ca cache) subdir f
> hunk ./src/Darcs/Repository/Cache.hs 303
> - ffuc cache
> - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir subdir)++
> + cacheFiltered <- filterBadSources cache
> + ffuc cacheFiltered
> + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir subdir ++
> " from sources:\n\n"++show (Ca cache))
OK. Filter the cache. Why not (filterBadSources cache >>= ffuc)?
> where ffuc (c:cs)
> | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) =
> hunk ./src/Darcs/Repository/Cache.hs 320
> fail $ "Hash failure in " ++ fn c
> return (fn c, x')
> else return (fn c, x) -- FIXME: create links in caches
> - `catchall` ffuc cs
> + `catchNonSignal` (\e -> do
> + checkCacheReachability (show e) c
> + cacheFiltered <- filterBadSources cs
> + ffuc cacheFiltered)
Another place to check and filter the cache.
Makes me wonder if you wouldn't like to write something like
checkCaches (show e) c cs >>= ffuc
instead...
>
> | writable c =
> do x1 <- gzFetchFilePS (fn c) Cachable
> hunk ./src/Darcs/Repository/Cache.hs 337
> else return x1
> mapM_ (tryLinking (fn c)) cs
> return (fn c, x)
> - `catchall` do (fname,x) <- ffuc cs
> - do createCache c subdir
> - createLink fname (fn c)
> - return (fn c, x)
> - `catchall`
> - do gzWriteFilePS (fn c) x `catchall` return ()
> - return (fname,x)
> + `catchNonSignal` (\ e ->
> + do
> + checkCacheReachability (show e) c
> + cacheFiltered <- filterBadSources cs
> + (fname,x) <- ffuc cacheFiltered
> + do createCache c subdir
> + createLink fname (fn c)
> + return (fn c, x)
> + `catchall`
> + do gzWriteFilePS (fn c) x `catchall` return ()
> + return (fname,x))
Looks equivalent, plus fixes up the cache list on errors plus filters
the caches some more. May I add that fetchFileUsingCache is in a dire
need of refactor? If you have a bit of time later, I'd welcome that.
> | otherwise = ffuc cs
>
> ffuc [] = debugFail $ "No sources from which to fetch file `"++f++"'\n"++ show (Ca cache)
> hunk ./src/Darcs/Repository/Cache.hs 409
> +-- | Prints an error message with a list of bad caches.
> +reportBadSources :: IO ()
> +reportBadSources = do
> + sources <- getBadSourcesList
> + let size = length sources
> + unless (null sources) $ do
> + hPutStrLn stderr $
> + concat ["\nI could not reach the following ",
> + englishNum size (Noun "repository") ":"]
> + hPutStrLn stderr $
> + concat [unlines sources,
> + "If you're not using ",
> + englishNum size It ", you should probably delete\nthe corresponding ",
> + englishNum size (Noun "entry") " from _darcs/prefs/sources."]
(I know Eric made you use concat here... I'd go with unlines
personally. You can do this in an extra patch if you want to, or just
forget it otherwise.)
[export]
> hunk ./src/HTTP.hs 22
> import qualified Data.ByteString.Char8 as BC
> #endif
>
> +data ConnectionError = CouldNotResolveHost |
> + CouldNotConnectToServer |
> + OperationTimeout
> + deriving (Eq, Read, Show)
Enumerate some common connection errors for use with above.
> +
> fetchUrl :: String -> IO String
> postUrl
> :: String -- ^ url
> hunk ./src/HTTP.hs 35
> -> IO () -- ^ result
>
> requestUrl :: String -> FilePath -> a -> IO String
> -waitNextUrl :: IO (String, String)
> +waitNextUrl :: IO (String, String, Maybe ConnectionError)
This Maybe ConnectionError feels a bit backwards, but OK.
> #ifdef HAVE_HTTP
>
> hunk ./src/HTTP.hs 94
> waitNextUrl = do
> (u, f) <- readIORef requestedUrl
> if null u
> - then return ("", "No URL requested")
> + then return ("", "No URL requested", Nothing)
> else do writeIORef requestedUrl ("", "")
> e <- (fetchUrl u >>= \s -> B.writeFile f (BC.pack s) >> return "") `catch` h
> hunk ./src/HTTP.hs 97
> - return (u, e)
> + let ce = case e of
> + "timeout" -> Just OperationTimeout
> + _ -> Nothing
> + return (u, e, ce)
Extract a possible connection error. OK.
> where h = return . ioeGetErrorString
>
> getProxy :: IO String
[imports again]
> hunk ./src/URL.hs 225
> let l = pipeLength st
> when (l > 0) $ do
> dbg "URL.waitNextUrl start"
> - (u, e) <- liftIO $ waitNextUrl'
> + (u, e, ce) <- liftIO $ waitNextUrl'
> let p = inProgress st
> new_st = st { inProgress = Map.delete u p
> , pipeLength = l - 1 }
> hunk ./src/URL.hs 240
> else case Map.lookup u p of
> Just (f, _, _) -> do
> removeFileMayNotExist (f++"-new_"++randomJunk st)
> - downloadComplete u e
> + case ce of
> + Just httpError -> downloadComplete u (show httpError)
> + Nothing -> downloadComplete u e
Pass errors from waitNextUrl down the pipe. OK.
> debugMessage $ "URL.waitNextUrl failed: "++
> u++" "++f++" "++e
> Nothing -> bug $ "Another possible bug in URL.waitNextUrl: "++u++" "++e
> hunk ./src/URL.hs 265
> Just var -> do
> e <- readMVar var
> modifyMVar_ urlNotifications (return . (Map.delete u))
> - unless (null e) (debugFail $ "Failed to download URL "++u++": "++e)
> + unless (null e) $ do
> + debugMessage $ "Failed to download URL "++u++": "++e
> + fail e
Is this any different?
> Nothing -> return () -- file was already downloaded
>
> dbg :: String -> StateT a IO ()
> hunk ./src/URL.hs 293
>
> setDebugHTTP :: IO ()
> requestUrl :: String -> FilePath -> Cachable -> IO String
> -waitNextUrl' :: IO (String, String)
> +waitNextUrl' :: IO (String, String, Maybe ConnectionError)
> pipeliningEnabled :: IO Bool
>
> #ifdef HAVE_CURL
> hunk ./src/URL.hs 306
> err <- curl_request_url ustr fstr (cachableToInt cache) >>= peekCString
> return err
>
> -waitNextUrl' = do
> - e <- curl_wait_next_url >>= peekCString
> - u <- curl_last_url >>= peekCString
> - return (u, e)
> +waitNextUrl' =
> + bracket malloc free $ \ errorPointer -> do
> + e <- curl_wait_next_url errorPointer >>= peekCString
> + ce <- if not (null e)
> + then do
> + errorNum <- peek errorPointer
> + case errorNum of
> + 6 -> return $ Just CouldNotResolveHost
> + 7 -> return $ Just CouldNotConnectToServer
> + 29 -> return $ Just OperationTimeout
> + _ -> return Nothing
> + else
> + return Nothing
> + u <- curl_last_url >>= peekCString
> + return (u, e, ce)
Oi, ugly. Well, that's the destiny of FFI code... Just translates cURL
error codes to something more malleable (ConnectionError). (I'd probably
use alloca myself, but bracket malloc free is probably equivalent.)
> pipeliningEnabled = do
> r <- curl_pipelining_enabled
> hunk ./src/URL.hs 330
> curl_request_url :: CString -> CString -> CInt -> IO CString
>
> foreign import ccall "hscurl.h curl_wait_next_url"
> - curl_wait_next_url :: IO CString
> + curl_wait_next_url :: Ptr CInt -> IO CString
>
> foreign import ccall "hscurl.h curl_last_url"
> curl_last_url :: IO CString
FFI bits. OK.
[imports for darcs.hs]
> hunk ./src/darcs.hs 53
> main = withAtexit $ withSignalsHandled $
> flip catch execExceptionHandler $
> handle (\(AssertionFailed e) -> bug e) $ do
> + atexit reportBadSources
> argv <- getArgs
> case argv of
> -- User called "darcs" without arguments.
[OK] Register bad sources reporting to run at exit time.
> hunk ./src/hscurl.c 253
> return error_strings[RESULT_OK];
> }
>
> -const char *curl_wait_next_url()
> +const char *curl_wait_next_url(int* errorCode)
> {
> hunk ./src/hscurl.c 255
> + *errorCode = -1;
> +
> if (last_url != NULL)
> {
> free(last_url);
> hunk ./src/hscurl.c 280
> CURLcode result = msg->data.result;
> struct UrlData *url_data;
> int error = curl_easy_getinfo(easy, CURLINFO_PRIVATE, (char **)&url_data);
> - if (error != CURLE_OK)
> + if (error != CURLE_OK){
Iew, that ){ is hurting my eyes. A bit of whitespace couldn't hurt! :)
> + *errorCode = error;
> return curl_easy_strerror(error);
> hunk ./src/hscurl.c 283
> + }
>
> last_url = url_data->url;
> fclose(url_data->file);
> hunk ./src/hscurl.c 295
> return curl_multi_strerror(error);
> curl_easy_cleanup(easy);
>
> - if (result != CURLE_OK)
> + if (result != CURLE_OK){
> + *errorCode = result;
> return curl_easy_strerror(result);
> hunk ./src/hscurl.c 298
> + }
> }
> else
> return error_strings[RESULT_UNKNOWN_MESSAGE];
> hunk ./src/hscurl.h 5
> const char *filename,
> int cache_time);
>
> -const char *curl_wait_next_url();
> +const char *curl_wait_next_url(int *errorCode);
>
> const char *curl_last_url();
OK.
|
msg12010 (view) |
Author: abuiles |
Date: 2010-08-07.00:32:03 |
|
Hi Petr,
Thanks for the review !.. I'm amending the stylistic things and resending.
waitNextUrl: "++u++" "++e
>> hunk ./src/URL.hs 265
>> Just var -> do
>> e <- readMVar var
>> modifyMVar_ urlNotifications (return . (Map.delete u))
>> - unless (null e) (debugFail $ "Failed to download URL "++u++": "++e)
>> + unless (null e) $ do
>> + debugMessage $ "Failed to download URL "++u++": "++e
>> + fail e
> Is this any different?
>
The message of debugFail is all the string you pass it as parameter.
--
Adolfo
|
msg12011 (view) |
Author: abuiles |
Date: 2010-08-07.00:41:37 |
|
1 patch for repository http://darcs.net:
Fri Aug 6 19:33:39 COT 2010 builes.adolfo@googlemail.com
* Resolve issue 1599: automatically expire unused caches
Attachments
|
msg12021 (view) |
Author: darcswatch |
Date: 2010-08-07.13:42:16 |
|
This patch bundle (with 1 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866
|
msg14055 (view) |
Author: darcswatch |
Date: 2011-05-10.17:35:41 |
|
This patch bundle (with 1 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866
|
|
Date |
User |
Action |
Args |
2010-07-22 06:12:24 | abuiles | create | |
2010-07-22 06:14:48 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0 |
2010-07-22 15:47:57 | kowey | set | nosy:
+ kowey messages:
+ msg11825 |
2010-07-22 18:08:14 | mornfall | set | nosy:
+ mornfall messages:
+ msg11828 |
2010-07-23 01:35:58 | abuiles | set | messages:
+ msg11831 |
2010-07-23 11:50:46 | kowey | set | messages:
+ msg11837 |
2010-07-23 11:50:58 | kowey | set | status: needs-review -> followup-in-progress |
2010-07-23 11:51:15 | kowey | set | assignedto: abuiles |
2010-07-23 11:58:05 | ganesh | set | nosy:
+ ganesh messages:
+ msg11839 |
2010-07-28 13:47:46 | abuiles | set | files:
+ -resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages:
+ msg11895 |
2010-07-28 13:49:42 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-1092d9b218048d9df86a58a1dd51f87cec476246 |
2010-07-28 17:46:38 | kowey | set | messages:
+ msg11898 |
2010-07-28 20:55:19 | abuiles | set | files:
+ resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages:
+ msg11899 |
2010-07-28 21:00:05 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-1092d9b218048d9df86a58a1dd51f87cec476246 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad |
2010-08-01 18:33:11 | kowey | set | messages:
+ msg11905 |
2010-08-01 18:37:41 | kowey | set | messages:
+ msg11906 |
2010-08-03 05:50:38 | abuiles | set | messages:
+ msg11921 |
2010-08-03 10:25:32 | kowey | set | messages:
+ msg11923 |
2010-08-03 14:32:49 | abuiles | set | files:
+ resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages:
+ msg11926 |
2010-08-03 14:34:28 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6 |
2010-08-05 13:38:00 | kowey | set | status: followup-in-progress -> followup-requested messages:
+ msg11975 |
2010-08-06 06:41:02 | abuiles | set | files:
+ resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages:
+ msg11985 |
2010-08-06 06:43:50 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716 |
2010-08-06 08:08:48 | kowey | set | status: followup-requested -> needs-review assignedto: abuiles -> galbolle messages:
+ msg11986 nosy:
+ galbolle |
2010-08-06 08:50:00 | kowey | set | assignedto: galbolle -> mornfall messages:
+ msg11987 |
2010-08-06 21:13:42 | mornfall | set | messages:
+ msg12009 |
2010-08-07 00:32:03 | abuiles | set | messages:
+ msg12010 |
2010-08-07 00:41:37 | abuiles | set | files:
+ resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages:
+ msg12011 |
2010-08-07 00:43:46 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866 |
2010-08-07 13:42:16 | darcswatch | set | status: needs-review -> accepted messages:
+ msg12021 |
2011-05-10 17:35:41 | darcswatch | set | messages:
+ msg14055 |
2011-05-10 17:36:13 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0 |
2011-05-10 18:05:52 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6 |
2011-05-10 19:05:43 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad |
2011-05-10 19:37:40 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716 |
2011-05-10 22:07:23 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-1092d9b218048d9df86a58a1dd51f87cec476246 |
|