darcs

Patch 294 Use cache while getting packed repository

Title Use cache while getting packed repository
Superseder Nosy List exlevan, kowey, mornfall
Related Issues
Status accepted Assigned To mornfall
Milestone

Created on 2010-07-03.08:48:12 by exlevan, last changed 2011-05-10.21:36:46 by darcswatch. Tracked on DarcsWatch.

Files
File name Status Uploaded Type Edit Remove
hardlink-files-while-getting-a-packed-repository.dpatch exlevan, 2010-08-02.05:05:29 text/x-darcs-patch
unnamed exlevan, 2010-07-03.08:48:11
unnamed exlevan, 2010-08-02.05:05:29
use-cache-while-getting-packed-repository.dpatch exlevan, 2010-07-03.08:48:11 text/x-darcs-patch
See mailing list archives for discussion on individual patches.
Messages
msg11676 (view) Author: exlevan Date: 2010-07-03.08:48:11
1 patch for repository http://darcs.net/:

Sat Jul  3 11:41:45 EEST 2010  Alexey Levan <exlevan@gmail.com>
  * Use cache while getting packed repository
Attachments
msg11678 (view) Author: mornfall Date: 2010-07-04.19:12:47
Hi Alexey,

one thing that would be certainly useful is a more elaborate description
of a bundle when you send it (I have this in my ~/.darcs/defaults: "send
edit-description" since I otherwise keep forgetting to invoke send with
--edit). I will try to review the patch, although I am not sure what was
its intent.

Use cache while getting packed repository
-----------------------------------------

> hunk ./src/Darcs/Commands/Optimize.lhs 27
>  import Control.Applicative ( (<$>) )
>  import Control.Monad ( when, unless )
>  import Data.Maybe ( isJust )
> +import Data.List ( sort )
>  import System.Directory ( getDirectoryContents, doesDirectoryExist,
> hunk ./src/Darcs/Commands/Optimize.lhs 29
> -                          doesFileExist, renameFile )
> +                          doesFileExist, renameFile, getModificationTime )
>  import System.IO.Unsafe ( unsafeInterleaveIO )
>  import qualified Data.ByteString.Char8 as BS
>  import qualified Data.ByteString.Lazy as BL
> hunk ./src/Darcs/Commands/Optimize.lhs 36
>  
>  import Storage.Hashed.Darcs( decodeDarcsSize )
>  
> -import Darcs.Hopefully ( info )
> +import Darcs.Hopefully ( info, extractHash )
>  import Darcs.Commands ( DarcsCommand(..), nodefaults )
>  import Darcs.Arguments ( DarcsFlag( UpgradeFormat, UseHashedInventory,
>                                      Compress, UnCompress,
> hunk ./src/Darcs/Commands/Optimize.lhs 73
>  
>  -- imports for optimize --upgrade; to be tidied
>  import System.Directory ( createDirectoryIfMissing, removeFile )
> -import System.FilePath.Posix ( takeExtension, (</>), (<.>) )
> +import System.FilePath.Posix ( takeExtension, (</>), (<.>), takeFileName )
>  
>  import Progress ( beginTedious, endTedious, tediousSize )
>  import Darcs.Flags ( compression )
(just imports)

> hunk ./src/Darcs/Commands/Optimize.lhs 148
>  optimizeCmd origopts _ = do
>      when (UpgradeFormat `elem` origopts) optimizeUpgradeFormat
>      withRepoLock opts $- \repository -> do
> -    when (OptimizeHTTP `elem` origopts) doOptimizeHTTP
> +    when (OptimizeHTTP `elem` origopts) $ doOptimizeHTTP repository
>      if (OptimizePristine `elem` opts)
>         then doOptimizePristine repository
>         else do cleanRepository repository
Just an extra parameter.

> hunk ./src/Darcs/Commands/Optimize.lhs 372
>        gzs <- filter ((== ".gz") . takeExtension) `fmap` getDirectoryContents "."
>        mapM_ removeFile gzs
>  
> -doOptimizeHTTP :: IO ()
> -doOptimizeHTTP = do
> +doOptimizeHTTP :: RepoPatch p => Repository p C(r u t) -> IO ()
> +doOptimizeHTTP repo = do
>    rf <- either fail return =<< identifyRepoFormat "."
>    unless (formatHas HashedInventory rf) . fail $
>      "Unsupported repository format:\n" ++
> hunk ./src/Darcs/Commands/Optimize.lhs 379
>      "  only hashed repositories can be optimized for HTTP"
>    createDirectoryIfMissing False packsDir
> -  ps <- dirContents' "patches" $ \x -> all (x /=) ["unrevert", "pending",
> -    "pending.tentative"]
> +  ps <- mapRL hashedPatchFileName . newset2RL <$> readRepo repo
We use inventories to get a list of patches instead of listing directory
contents -- this should be indeed more robust. OK. We could also do
something like this to get a list of the actually relevant inventories:
these tend to accumulate a fair amount of garbage, actually, so it may
be even more useful in that case. Check out readInventories and
readInventoryPrivate in Darcs.Repository.HashedRepo.  (The latter reads
a single inventory file, and gives you a list of PatchInfos and a Maybe
a String (= hash) of the next inventory in the list. Nothing here means
there are no more inventories. (I should probably turn that into a
haddock. Hm.)

>    BL.writeFile (patchesTar <.> "part") . compress . write =<<
>      mapM fileEntry' ps
>    renameFile (patchesTar <.> "part") patchesTar
> hunk ./src/Darcs/Commands/Optimize.lhs 383
> -  let i = darcsdir </> "hashed_inventory"
> -  is <- dirContents "inventories"
> -  pr <- dirContents "pristine.hashed"
> -  BL.writeFile (basicTar <.> "part") . compress . write =<<
> -    mapM fileEntry' (i : (is ++ pr))
> +  is <- sortByMTime =<< dirContents "inventories"
> +  writeFile (darcsdir </> "tmp-inventories") . unlines $ map takeFileName is
> +  pr <- sortByMTime =<< dirContents "pristine.hashed"
> +  writeFile (darcsdir </> "tmp-pristine") . unlines $ map takeFileName pr
> +  BL.writeFile (basicTar <.> "part") . compress . write =<< mapM fileEntry' (
> +    [ darcsdir </> "tmp-inventories"
> +    , darcsdir </> "tmp-pristine"
> +    , darcsdir </> "hashed_inventory"
> +    ] ++ reverse pr ++ reverse is)
This sorts the basic.tar.gz in a newest-first order (pristines first,
then inventories). It did make me wonder though, whether we actually
want the inventories in the basic.tar.gz... They are not fetched by
ordinary lazy get, are they?

It seems that tmp-pristine and tmp-inventories come with a list of files
that are packed in the tarball. I am not sure this is necessary, since
the tar itself already is a list with filenames in it?

>    renameFile (basicTar <.> "part") basicTar
> hunk ./src/Darcs/Commands/Optimize.lhs 393
> +  removeFile $ darcsdir </> "tmp-inventories"
> +  removeFile $ darcsdir </> "tmp-pristine"
Get rid of the temporaries (could these be avoided by constructing these
two lists in memory and feeding them to Tar?).

>   where
>    packsDir = darcsdir </> "packs"
>    basicTar = packsDir </> "basic.tar.gz"
> hunk ./src/Darcs/Commands/Optimize.lhs 406
>    dirContents d = dirContents' d $ const True
>    dirContents' d f = map ((darcsdir </> d) </>) . filter (\x ->
>      head x /= '.' && f x) <$> getDirectoryContents (darcsdir </> d)
> +  hashedPatchFileName x = case extractHash x of
> +    Left _ -> fail "unexpected unhashed patch"
> +    Right h -> darcsdir </> "patches" </> h
> +  sortByMTime xs = map snd . sort <$> mapM (\x -> (\t -> (t, x)) <$>
> +    getModificationTime x) xs
Helpers.

> hunk ./src/Darcs/Repository.hs 48
>      ) where
>  
>  import System.Exit ( ExitCode(..), exitWith )
> -import Data.List ( isSuffixOf )
>  import Data.Maybe( catMaybes )
>  
>  import Darcs.Repository.State( readRecorded, readUnrecorded, readWorking, unrecordedChanges
> hunk ./src/Darcs/Repository.hs 77
>      )
>  import Darcs.Repository.Merge( tentativelyMergePatches, considerMergeToWorking )
>  import Darcs.Repository.Cache ( unionRemoteCaches, fetchFileUsingCache,
> -                                speculateFileUsingCache, HashedDir(..), Cache(..), CacheLoc(..), WritableOrNot(..))
> +                                speculateFileUsingCache, HashedDir(..), Cache(..),
> +                                CacheLoc(..), WritableOrNot(..), hashedDir )
>  import Darcs.Patch.Set ( PatchSet(..), SealedPatchSet, newset2RL, newset2FL, progressPatchSet )
>  #ifdef GADT_WITNESSES
>  import Darcs.Patch.Set ( Origin )
> hunk ./src/Darcs/Repository.hs 86
>  import URL ( maxPipelineLength )
>  
>  import Control.Applicative ( (<$>) )
> +import Control.Concurrent ( forkIO )
>  import Control.Monad ( unless, when )
>  import System.Directory ( createDirectory, renameDirectory,
> hunk ./src/Darcs/Repository.hs 89
> -                          createDirectoryIfMissing, renameFile )
> +                          createDirectoryIfMissing, renameFile, doesFileExist )
>  import System.IO.Error ( isAlreadyExistsError )
>  
>  import qualified Darcs.Repository.DarcsRepo as DarcsRepo
> hunk ./src/Darcs/Repository.hs 134
>  import qualified Codec.Archive.Tar as Tar
>  import Codec.Compression.GZip ( compress, decompress )
>  import qualified Data.ByteString.Char8 as BS
> -import qualified Data.ByteString.Lazy as BL
> +import qualified Data.ByteString.Lazy.Char8 as BL
>  
>  #include "impossible.h"
>  
Imports.

> hunk ./src/Darcs/Repository.hs 287
>    createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
>    createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
>    copySources toRepo fromDir
> +  Repo _ _ _ (DarcsRepository _ toCache3) <-
> +    identifyRepositoryFor fromRepo "."
I am not sure why is this here and what is the effect? What is the
difference from toCache2?

>    -- unpack inventory & pristine cache
> hunk ./src/Darcs/Repository.hs 290
> -  writeCompressed . Tar.read $ decompress b
> +  writeBasic toCache3 . Tar.read $ decompress b
>    createPristineDirectoryTree toRepo "."
>    -- pull new patches
>    us <- readRepo toRepo
> hunk ./src/Darcs/Repository.hs 306
>    -- get old patches
>    unless (any (`elem` opts) [Partial, Lazy, Ephemeral]) $ do
>      putInfo "Copying patches, to get lazy repository hit ctrl-C..."
> +    _ <- forkIO . fetchFiles toCache3 HashedPatchesDir .
> +      mapFL hashedPatchFileName $ newset2FL us
>      writeCompressed . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir ++
>        "patches.tar.gz") Uncachable
>   where
> hunk ./src/Darcs/Repository.hs 311
> +  writeBasic c (Tar.Next is (Tar.Next pr (Tar.Next hi xs))) = do
> +    case map Tar.entryContent [is, pr, hi] of
> +      [Tar.NormalFile is' _, Tar.NormalFile pr' _, Tar.NormalFile hi' _] -> do
> +        _ <- forkIO $ do
> +          fetchFiles c HashedInventoriesDir . lines $ BL.unpack is'
> +          fetchFiles c HashedPristineDir . lines $ BL.unpack pr'
> +        BL.writeFile (darcsdir </> "hashed_inventory") hi'
> +        writeCompressed xs
> +      _ -> fail "Unexpected non-file tar entry"
> +  writeBasic _ _ =  fail "Error in basic tar file"
For all I can tell, this extracts the tmp-{pristine,inventories} lists
from the tarball and starts to copy them from cache. We do this
concurrently with unpacking the tarball. Seems that first wins. I am
however wondering, if the cache we are using here only contains local
sources? (I.e. caches and possibly repos...) In that case, it would make
some sense, although I don't see any exception handling in fetchFiles
(below). The overall idea is interesting, but I don't see how it saves
any bandwidth (or time): we still need to download the complete
basic.tar.gz, right?

Oh. I see now that you stop the unpacking and the download once you have
a *first* hit... I think that's wrong though. It is, if nothing else,
prone to races: either of the threads could win. If the tarball loses,
it is not used at all, which could happen even if you don't have
anything cached. Nevertheless, it is still downloaded, at least until
the process is terminated, slowing down the process of getting any
pristine content and patches that may be missing in the cache. Moreover,
it could also happen that even though you have almost everything in the
cache, the tarball is fully downloaded.

So instead it would be good if we could stop downloading once we have
all the files. Also, a solution like that wouldn't need the
tmp-{pristine,inventories} lists nor the concurrency, since it would
need to parse the directory entries and inventories anyway and could see
whether anything is still missing. The tarball would come in the mtime
order as it already does (but hashed_inventory needs to come first, in
any case). Then, you could just maintain a list of hashes that are still
missing as you are processing the tarball. Once that list is empty, you
stop. The list starts with pristine root, which is stored in
hashed_inventory, and every time an item is removed from the list,
anything it mentions (this is where you need to parse things) and we
don't have yet (you look in the cache for those) is added to it. There
are utilities in Storage.Hashed.Darcs that should help with that.

Also, anything you unpack from the tarball should be linked into the
cache if it's not there yet. If it is, the tarball copy should be
probably ignored and you should hardlink from the cache instead (saves
space).

>    writeCompressed Tar.Done = return ()
>    writeCompressed (Tar.Next x xs) = case Tar.entryContent x of
>      Tar.NormalFile x' _ -> do
> hunk ./src/Darcs/Repository.hs 325
>        let p = Tar.entryPath x
> -      withTemp $ \p' -> do
> -        BL.writeFile p' $ if "hashed_inventory" `isSuffixOf` p
> -          then x'
> -          else compress x'
> -        renameFile p' p
> -      writeCompressed xs
> +      ex <- doesFileExist p
> +      unless ex $ do
> +        withTemp $ \p' -> do
> +          BL.writeFile p' $ compress x'
> +          renameFile p' p
> +        writeCompressed xs
Avoids clobbering and removes the special handling of hashed_inventory
(that one is now handled in writeBasic).

>      _ -> fail "Unexpected non-file tar entry"
>    writeCompressed (Tar.Fail e) = fail e
>    putInfo = when (not $ Quiet `elem` opts) . putStrLn
> hunk ./src/Darcs/Repository.hs 334
> -
> +  fetchFiles _ _ [] = return ()
> +  fetchFiles c d (f:fs) = do
> +    x <- doesFileExist $ hashedDir d </> f
> +    unless x $ do
> +      fetchFileUsingCache c d f
> +      fetchFiles c d fs
> +  hashedPatchFileName x = case extractHash x of
> +    Left _ -> fail "unexpected unhashed patch"
> +    Right h -> h
Helpers.

Yours,
   Petr.

PS: I haven't had the chance to look at the performance of the patch
yet. I won't apply it at least until I hear from you regarding the above
comments -- I will try to benchmark it and see if it works at least in
the usual cases. Something will nevertheless have to be done about the
download of the tarball in cases where it is never used.
msg11742 (view) Author: exlevan Date: 2010-07-14.11:01:05
Hi, sorry for delay with answer.

>
>> hunk ./src/Darcs/Commands/Optimize.lhs 148
>>  optimizeCmd origopts _ = do
>>      when (UpgradeFormat `elem` origopts) optimizeUpgradeFormat
>>      withRepoLock opts $- \repository -> do
>> -    when (OptimizeHTTP `elem` origopts) doOptimizeHTTP
>> +    when (OptimizeHTTP `elem` origopts) $ doOptimizeHTTP repository
>>      if (OptimizePristine `elem` opts)
>>         then doOptimizePristine repository
>>         else do cleanRepository repository
> Just an extra parameter.
>
>> hunk ./src/Darcs/Commands/Optimize.lhs 372
>>        gzs <- filter ((== ".gz") . takeExtension) `fmap` getDirectoryContents "."
>>        mapM_ removeFile gzs
>>
>> -doOptimizeHTTP :: IO ()
>> -doOptimizeHTTP = do
>> +doOptimizeHTTP :: RepoPatch p => Repository p C(r u t) -> IO ()
>> +doOptimizeHTTP repo = do
>>    rf <- either fail return =<< identifyRepoFormat "."
>>    unless (formatHas HashedInventory rf) . fail $
>>      "Unsupported repository format:\n" ++
>> hunk ./src/Darcs/Commands/Optimize.lhs 379
>>      "  only hashed repositories can be optimized for HTTP"
>>    createDirectoryIfMissing False packsDir
>> -  ps <- dirContents' "patches" $ \x -> all (x /=) ["unrevert", "pending",
>> -    "pending.tentative"]
>> +  ps <- mapRL hashedPatchFileName . newset2RL <$> readRepo repo
> We use inventories to get a list of patches instead of listing directory
> contents -- this should be indeed more robust. OK. We could also do
> something like this to get a list of the actually relevant inventories:
> these tend to accumulate a fair amount of garbage, actually, so it may
> be even more useful in that case. Check out readInventories and
> readInventoryPrivate in Darcs.Repository.HashedRepo.  (The latter reads
> a single inventory file, and gives you a list of PatchInfos and a Maybe
> a String (= hash) of the next inventory in the list. Nothing here means
> there are no more inventories. (I should probably turn that into a
> haddock. Hm.)

OK, sounds good.

>>    BL.writeFile (patchesTar <.> "part") . compress . write =<<
>>      mapM fileEntry' ps
>>    renameFile (patchesTar <.> "part") patchesTar
>> hunk ./src/Darcs/Commands/Optimize.lhs 383
>> -  let i = darcsdir </> "hashed_inventory"
>> -  is <- dirContents "inventories"
>> -  pr <- dirContents "pristine.hashed"
>> -  BL.writeFile (basicTar <.> "part") . compress . write =<<
>> -    mapM fileEntry' (i : (is ++ pr))
>> +  is <- sortByMTime =<< dirContents "inventories"
>> +  writeFile (darcsdir </> "tmp-inventories") . unlines $ map takeFileName is
>> +  pr <- sortByMTime =<< dirContents "pristine.hashed"
>> +  writeFile (darcsdir </> "tmp-pristine") . unlines $ map takeFileName pr
>> +  BL.writeFile (basicTar <.> "part") . compress . write =<< mapM fileEntry' (
>> +    [ darcsdir </> "tmp-inventories"
>> +    , darcsdir </> "tmp-pristine"
>> +    , darcsdir </> "hashed_inventory"
>> +    ] ++ reverse pr ++ reverse is)
> This sorts the basic.tar.gz in a newest-first order (pristines first,
> then inventories). It did make me wonder though, whether we actually
> want the inventories in the basic.tar.gz... They are not fetched by
> ordinary lazy get, are they?

Having inventories in basic doesn't mean they are going to be fetched
:) Stopping download before the inventories (in case of get --lazy) is
optimisation waiting to be implemented.

> It seems that tmp-pristine and tmp-inventories come with a list of files
> that are packed in the tarball. I am not sure this is necessary, since
> the tar itself already is a list with filenames in it?

AFAICT filenames in tar interleaved with file contents, and list of
files is needed to early start parallel fetching from cache.

>>    renameFile (basicTar <.> "part") basicTar
>> hunk ./src/Darcs/Commands/Optimize.lhs 393
>> +  removeFile $ darcsdir </> "tmp-inventories"
>> +  removeFile $ darcsdir </> "tmp-pristine"
> Get rid of the temporaries (could these be avoided by constructing these
> two lists in memory and feeding them to Tar?).

Well, yes, but that would mean allocating lots of memory in case of
large repositories. Look like a small tradeoff in favour of
scalability for me.

>> hunk ./src/Darcs/Repository.hs 287
>>    createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
>>    createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
>>    copySources toRepo fromDir
>> +  Repo _ _ _ (DarcsRepository _ toCache3) <-
>> +    identifyRepositoryFor fromRepo "."
> I am not sure why is this here and what is the effect? What is the
> difference from toCache2?

in copySources the source file is being rewritten, so I re-read repo
to get a new Cache. The difference is that toCache2 doesn't work :) Of
course, if there are some helpers to avoid that that I'm missing, I'll
rewrite it to make the code a bit cleaner.

>>    -- unpack inventory & pristine cache
>> hunk ./src/Darcs/Repository.hs 290
>> -  writeCompressed . Tar.read $ decompress b
>> +  writeBasic toCache3 . Tar.read $ decompress b
>>    createPristineDirectoryTree toRepo "."
>>    -- pull new patches
>>    us <- readRepo toRepo
>> hunk ./src/Darcs/Repository.hs 306
>>    -- get old patches
>>    unless (any (`elem` opts) [Partial, Lazy, Ephemeral]) $ do
>>      putInfo "Copying patches, to get lazy repository hit ctrl-C..."
>> +    _ <- forkIO . fetchFiles toCache3 HashedPatchesDir .
>> +      mapFL hashedPatchFileName $ newset2FL us
>>      writeCompressed . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir ++
>>        "patches.tar.gz") Uncachable
>>   where
>> hunk ./src/Darcs/Repository.hs 311
>> +  writeBasic c (Tar.Next is (Tar.Next pr (Tar.Next hi xs))) = do
>> +    case map Tar.entryContent [is, pr, hi] of
>> +      [Tar.NormalFile is' _, Tar.NormalFile pr' _, Tar.NormalFile hi' _] -> do
>> +        _ <- forkIO $ do
>> +          fetchFiles c HashedInventoriesDir . lines $ BL.unpack is'
>> +          fetchFiles c HashedPristineDir . lines $ BL.unpack pr'
>> +        BL.writeFile (darcsdir </> "hashed_inventory") hi'
>> +        writeCompressed xs
>> +      _ -> fail "Unexpected non-file tar entry"
>> +  writeBasic _ _ =  fail "Error in basic tar file"
> For all I can tell, this extracts the tmp-{pristine,inventories} lists
> from the tarball and starts to copy them from cache. We do this
> concurrently with unpacking the tarball. Seems that first wins. I am
> however wondering, if the cache we are using here only contains local
> sources? (I.e. caches and possibly repos...) In that case, it would make
> some sense, although I don't see any exception handling in fetchFiles
> (below). The overall idea is interesting, but I don't see how it saves
> any bandwidth (or time): we still need to download the complete
> basic.tar.gz, right?
>
> Oh. I see now that you stop the unpacking and the download once you have
> a *first* hit... I think that's wrong though. It is, if nothing else,
> prone to races: either of the threads could win. If the tarball loses,
> it is not used at all, which could happen even if you don't have
> anything cached. Nevertheless, it is still downloaded, at least until
> the process is terminated, slowing down the process of getting any
> pristine content and patches that may be missing in the cache. Moreover,
> it could also happen that even though you have almost everything in the
> cache, the tarball is fully downloaded.

Well, yes, I'm making an assumption here that connection is fairly
shared between the threads. So from the one end of filelist (tmp-*) a
pack is downloading, from the other files are downloading one-by-one
using cache (if there is one), and when they meet somewhere in
between, download is done. The good thing here is that darcs doesn't
try to be too smart here: for example, the http source might be on
localhost, and "local" cache might be on the network share. I'm not
sure I understand the part about races, could you please explain how
exactly the threads may win/lose here?

> So instead it would be good if we could stop downloading once we have
> all the files. Also, a solution like that wouldn't need the
> tmp-{pristine,inventories} lists nor the concurrency, since it would
> need to parse the directory entries and inventories anyway and could see
> whether anything is still missing. The tarball would come in the mtime
> order as it already does (but hashed_inventory needs to come first, in
> any case). Then, you could just maintain a list of hashes that are still
> missing as you are processing the tarball. Once that list is empty, you
> stop. The list starts with pristine root, which is stored in
> hashed_inventory, and every time an item is removed from the list,
> anything it mentions (this is where you need to parse things) and we
> don't have yet (you look in the cache for those) is added to it. There
> are utilities in Storage.Hashed.Darcs that should help with that.

A bit I'm not sure about here is order in which tarball/cache are
used. Are we downloading a tarball here upto entry we have in cache?
What if there's different branch in cache and it's missing files in
the middle of list?

> Also, anything you unpack from the tarball should be linked into the
> cache if it's not there yet. If it is, the tarball copy should be
> probably ignored and you should hardlink from the cache instead (saves
> space).

OK.

> PS: I haven't had the chance to look at the performance of the patch
> yet. I won't apply it at least until I hear from you regarding the above
> comments -- I will try to benchmark it and see if it works at least in
> the usual cases. Something will nevertheless have to be done about the
> download of the tarball in cases where it is never used.

Unfortunately, I didn't test it much either. I've tried to download
packed darcs repo from localhost, the global cache was used, and the
threads were fairly interleaved.
msg11910 (view) Author: exlevan Date: 2010-08-02.05:05:29
Amendment to cache-related packs code, along with a couple more fixes.
Tarball contents now gets hadlinked to/from cache; filenames starting with
"tmp-" in basic.tar.gz are reserved for future use, tmp-{pristine,inventories}
renamed to tmp-filelist-{pristine,inventories}; irrelevant inventories are no
longer packed to basic tarball (that should resolve issue1889, currently there
is 90% of waste in basic.tar.gz on darcs.net).

Tarballs are still have to be downloaded completely before their content can
be read; there seems to be no way current darcs code can be used to download
files in stream-like fashion, and getting this functionality would require
substantional refactoring/rewrite of Darcs.URL.

3 patches for repository http://darcs.net:

Sun Aug  1 09:03:49 MSD 2010  Alexey Levan <exlevan@gmail.com>
  * Hardlink files while getting a packed repository

Mon Aug  2 06:49:14 MSD 2010  Alexey Levan <exlevan@gmail.com>
  * Use cache while getting a packed repository

Mon Aug  2 07:54:45 MSD 2010  Alexey Levan <exlevan@gmail.com>
  * Minimize the number of packed inventories
Attachments
msg11930 (view) Author: kowey Date: 2010-08-03.17:26:21
Just sending the tennis ball back to Petr.
msg11945 (view) Author: mornfall Date: 2010-08-04.18:10:55
Hi,

Alexey Levan <bugs@darcs.net> writes:
> Amendment to cache-related packs code, along with a couple more fixes.
> Tarball contents now gets hadlinked to/from cache; filenames starting with
> "tmp-" in basic.tar.gz are reserved for future use, tmp-{pristine,inventories}
> renamed to tmp-filelist-{pristine,inventories}; irrelevant inventories are no
> longer packed to basic tarball (that should resolve issue1889, currently there
> is 90% of waste in basic.tar.gz on darcs.net).

Some high-level remarks first: I would like the "tmp-" prefix to be
changed to something meaningful. Probably "meta-".

> Tarballs are still have to be downloaded completely before their content can
> be read; there seems to be no way current darcs code can be used to download
> files in stream-like fashion, and getting this functionality would require
> substantional refactoring/rewrite of Darcs.URL.

Ok, I agree this is not completely easy with the way URL (not to be
confused with Darcs.URL!) works. My idea was that we would just use the
HTTP library directly in fetchFileLazyPS when the thing to fetch was an
HTTP url and fall back to copyAndReadFile otherwise (which would still
involve a copy).

The code would look something like (just drafted in this mail, may not
compile nor work):

fetchFileLazyPS url c
    | isUrl url && "http" `isPrefixOf` url = do -- maybe need to case-convert url here
        rsp <- fetch repo_url
        unless (rspCode rsp == (2, 0, 0)) $ fallback -- just in case
        return $ rspBody rsp
    | otherwise = fallback
  where fallback = copyAndReadFile BL.readFile url c
        fetch url =
          do (_, rsp) <- browse $ do setCheckForProxy True
                                     setOutHandler (const $ return ())
                                     request (mkRequest GET url)
             return rsp

I guess functionality like this would be rather useful for the remaining
code in these patches.

Overall, I am going to accept this bundle, however I am still requesting
a few fixes to be done on top. One is the "tmp-" -> "meta-" (or
something more suitable) rename I mentioned above. Another would be to
make the tarball unpacking more tolerant/flexible.

Something like

unpackBasic cache tar@(Tar.Next x xs) lists = do
   withTarFile x $ \p c -> case p of
       _ | "meta-" `isPrefixOf` takeFileName p -> processMeta cache tar lists
         | otherwise -> unpackFile p c xs
   unpackBasic cache xs lists

processMeta cache (Tar.Next x xs) lists = do
   lists' <- withTarFile x process_one_meta
   if (have_both lists) then ... forkIO ...
                        else unpackBasic cache xs lists'
  where process_one_meta p c = ... -- update "lists"

i.e. you process files as they come and unpack them as they come and
when you run into the right "meta-" files you use them (if you know how)
or discard them (if you don't).

That way, the only requirement for the basic tarball is that it contains
_darcs/hashed_inventory (anywhere) and _darcs/pristine.hashed and that
any files that should not end up under _darcs are prefixed with meta-
(these have special meaning for the unpacker). Hopefully, these
precautions will make our life in the future a lot easier, if we need to
change how the basic tarball is built (say we decide that we want to add
a meta- file at the end of the basic tarball to carry the filecache or
something akin to that -- a darcs that doesn't know about filecache
should just discard that meta- file, and it should certainly not crash
upon doing a "get"). (Sorry I forgot the new name for filecache... was
that patchindex?)

I think we want to have both these before releasing a stable darcs that
can use packs for download.

Some more comments are interspersed in the patches below.

I am going to compile & test the patch soon-ish and then push. Please
record any further changes as new patches and send them as a new bundle.

Yours,
   Petr.

Hardlink files while getting a packed repository
------------------------------------------------
> hunk ./src/Darcs/Repository.hs 49
>      ) where
>  
>  import System.Exit ( ExitCode(..), exitWith )
> -import Data.List ( isSuffixOf )
> +import Data.List ( isPrefixOf )
>  import Data.Maybe( catMaybes )
>  
>  import Darcs.Repository.State( readRecorded, readUnrecorded, readWorking, unrecordedChanges
> hunk ./src/Darcs/Repository.hs 79
>      )
>  import Darcs.Repository.Merge( tentativelyMergePatches, considerMergeToWorking )
>  import Darcs.Repository.Cache ( unionRemoteCaches, fetchFileUsingCache,
> -                                speculateFileUsingCache, HashedDir(..), Cache(..), CacheLoc(..), WritableOrNot(..))
> +                                speculateFileUsingCache, HashedDir(..), Cache(..), CacheLoc(..), WritableOrNot(..), CacheType(Directory) )
>  import Darcs.Patch.Set ( PatchSet(..), SealedPatchSet, newset2RL, newset2FL, progressPatchSet )
>  #ifdef GADT_WITNESSES
>  import Darcs.Patch.Set ( Origin )
> hunk ./src/Darcs/Repository.hs 87
>  import URL ( maxPipelineLength )
>  
>  import Control.Applicative ( (<$>) )
> -import Control.Monad ( unless, when )
> +import Control.Monad ( unless, when , (>=>) )
>  import System.Directory ( createDirectory, renameDirectory,
> hunk ./src/Darcs/Repository.hs 89
> -                          createDirectoryIfMissing, renameFile )
> +                          createDirectoryIfMissing, renameFile,
> +                          doesFileExist, removeFile, getDirectoryContents )
>  import System.IO.Error ( isAlreadyExistsError )
> hunk ./src/Darcs/Repository.hs 92
> +import System.Posix.Files ( createLink )
>  
>  import qualified Darcs.Repository.DarcsRepo as DarcsRepo
>  import qualified Darcs.Repository.HashedRepo as HashedRepo
> hunk ./src/Darcs/Repository.hs 132
>  import Storage.Hashed( writePlainTree )
>  import ByteStringUtils( gzReadFilePS )
>  
> -import System.FilePath( (</>) )
> +import System.FilePath( (</>), takeFileName, splitPath, joinPath
> +                      , takeDirectory )
>  import qualified Codec.Archive.Tar as Tar
>  import Codec.Compression.GZip ( compress, decompress )
>  import qualified Data.ByteString.Char8 as BS
> hunk ./src/Darcs/Repository.hs 290
>    createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
>    createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
>    copySources toRepo fromDir
> +  Repo _ _ _ (DarcsRepository _ toCache3) <-
> +    identifyRepositoryFor toRepo "."
> +  let
> +    cs = case toCache3 of
> +      Ca cs' -> catMaybes . flip map cs' $ \x -> case x of
> +        Cache Directory Writable x' -> Just x'
> +        _ -> Nothing
> +    ca = if not (null cs) then Just (head cs) else Nothing
>    -- unpack inventory & pristine cache
> hunk ./src/Darcs/Repository.hs 299
> -  writeCompressed . Tar.read $ decompress b
> +  procBasicTar ca . Tar.read $ decompress b
>    createPristineDirectoryTree toRepo "."
>    -- pull new patches
>    us <- readRepo toRepo
> hunk ./src/Darcs/Repository.hs 315
>    -- get old patches
>    unless (any (`elem` opts) [Partial, Lazy, Ephemeral]) $ do
>      putInfo "Copying patches, to get lazy repository hit ctrl-C..."
> -    writeCompressed . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir ++
> +    procPatches ca . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir ++
>        "patches.tar.gz") Uncachable
>   where
> hunk ./src/Darcs/Repository.hs 318
> -  writeCompressed Tar.Done = return ()
> -  writeCompressed (Tar.Next x xs) = case Tar.entryContent x of
> -    Tar.NormalFile x' _ -> do
> -      let p = Tar.entryPath x
> -      withTemp $ \p' -> do
> -        BL.writeFile p' $ if "hashed_inventory" `isSuffixOf` p
> -          then x'
> -          else compress x'
> -        renameFile p' p
> -      writeCompressed xs
> +  procBasicTar ca = procHashedInv >=> procTmp >=> procFiles ca
> +  procPatches = procFiles
> +  procHashedInv Tar.Done = fail
> +    "Unexpected end of file; hashed_inventory expected"
> +  procHashedInv (Tar.Next x xs) = withTarFile x $ \p c ->
> +    if "hashed_inventory" == takeFileName p
> +      then do
> +        writeFile' Nothing p c
> +        return xs
> +      else fail $ "Unexpected file: " ++ takeFileName p ++
> +        "\nhashed_inventory expected"
> +  procHashedInv (Tar.Fail e) = fail e
> +  procTmp Tar.Done = return Tar.Done
> +  procTmp xxs@(Tar.Next x xs) = withTarFile x $ \p c ->
> +    if "tmp-" `isPrefixOf` p
> +      then do
> +        BL.writeFile p c
> +        procTmp xs
> +      else do
> +        mapM removeFile . filter ("tmp-" `isPrefixOf`) =<<
> +          getDirectoryContents "."
> +        return xxs
> +  procTmp (Tar.Fail e) = fail e
> +  procFiles _ Tar.Done = return ()
> +  procFiles ca (Tar.Next x xs) = withTarFile x $ \p c -> do
> +    writeFile' ca p $ compress c
> +    procFiles ca xs
> +  procFiles _ (Tar.Fail e) = fail e
> +  withTarFile x f = case Tar.entryContent x of
> +    Tar.NormalFile x' _ -> f (Tar.entryPath x) x'
>      _ -> fail "Unexpected non-file tar entry"
> hunk ./src/Darcs/Repository.hs 349
> -  writeCompressed (Tar.Fail e) = fail e
> +  writeFile' Nothing x y = withTemp $ \x' -> do
> +    BL.writeFile x' y
> +    renameFile x' x
> +  writeFile' (Just ca) x y = do
> +    let x' = joinPath . tail $ splitPath x -- drop darcsdir
> +    ex <- doesFileExist $ ca </> x'
> +    if ex
> +      then createLink' (ca </> x') x
> +      else withTemp $ \x'' -> do
> +        BL.writeFile x'' y
> +        createLink' x'' $ ca </> x'
> +        renameFile x'' x
> +  createLink' x y = do
> +    createDirectoryIfMissing True $ takeDirectory y
> +    createLink x y `catchall` return ()
>    putInfo = when (not $ Quiet `elem` opts) . putStrLn

The code above is basically correct, with two caveats:

- multiple write-able caches are not supported (probably rare, but
  supported by other parts of darcs)... I guess this is not a
  showstopper (and we may end up removing this feature anyway)
- the basic tarball only handles the (reserved) tmp-* files between
  hashed_inventory and the rest of the tarball... I'd more appreciate if
  they were ignored anywhere encountered

Use cache while getting a packed repository
-------------------------------------------

[snip]

> hunk ./src/Darcs/Commands/Optimize.lhs 372
>        gzs <- filter ((== ".gz") . takeExtension) `fmap` getDirectoryContents "."
>        mapM_ removeFile gzs
>  
> -doOptimizeHTTP :: IO ()
> -doOptimizeHTTP = do
> +doOptimizeHTTP :: RepoPatch p => Repository p C(r u t) -> IO ()
> +doOptimizeHTTP repo = do
>    rf <- either fail return =<< identifyRepoFormat "."
>    unless (formatHas HashedInventory rf) . fail $
>      "Unsupported repository format:\n" ++
> hunk ./src/Darcs/Commands/Optimize.lhs 379
>      "  only hashed repositories can be optimized for HTTP"
>    createDirectoryIfMissing False packsDir
> -  ps <- dirContents' "patches" $ \x -> all (x /=) ["unrevert", "pending",
> -    "pending.tentative"]
> +  ps <- mapRL hashedPatchFileName . newset2RL <$> readRepo repo
>    BL.writeFile (patchesTar <.> "part") . compress . write =<<
>      mapM fileEntry' ps
>    renameFile (patchesTar <.> "part") patchesTar
> hunk ./src/Darcs/Commands/Optimize.lhs 383
> -  let i = darcsdir </> "hashed_inventory"
> -  is <- dirContents "inventories"
> -  pr <- dirContents "pristine.hashed"
> -  BL.writeFile (basicTar <.> "part") . compress . write =<<
> -    mapM fileEntry' (i : (is ++ pr))
> +  is <- sortByMTime =<< dirContents "inventories"
> +  writeFile (darcsdir </> "tmp-filelist-inventories") . unlines $
> +    map takeFileName is
> +  pr <- sortByMTime =<< dirContents "pristine.hashed"
> +  writeFile (darcsdir </> "tmp-filelist-pristine") . unlines $
> +    map takeFileName pr
This writeFile (darcsdir </> ("tmp-" ++ name) . unlines $ map
takeFileName list) could probably live in a helper function (writeList?)

> +  BL.writeFile (basicTar <.> "part") . compress . write =<< mapM fileEntry' (
> +    [ darcsdir </> "hashed_inventory"
> +    , darcsdir </> "tmp-filelist-pristine"
> +    , darcsdir </> "tmp-filelist-inventories"
> +    ] ++ reverse pr ++ reverse is)
>    renameFile (basicTar <.> "part") basicTar
> hunk ./src/Darcs/Commands/Optimize.lhs 395
> +  removeFile $ darcsdir </> "tmp-filelist-inventories"
> +  removeFile $ darcsdir </> "tmp-filelist-pristine"
Should this go into a finally block to clean up in case of failure as
well?

>   where
>    packsDir = darcsdir </> "packs"
>    basicTar = packsDir </> "basic.tar.gz"
> hunk ./src/Darcs/Commands/Optimize.lhs 408
>    dirContents d = dirContents' d $ const True
>    dirContents' d f = map ((darcsdir </> d) </>) . filter (\x ->
>      head x /= '.' && f x) <$> getDirectoryContents (darcsdir </> d)
> +  hashedPatchFileName x = case extractHash x of
> +    Left _ -> fail "unexpected unhashed patch"
> +    Right h -> darcsdir </> "patches" </> h
> +  sortByMTime xs = map snd . sort <$> mapM (\x -> (\t -> (t, x)) <$>
> +    getModificationTime x) xs
>  \end{code}

[snip]

> hunk ./src/Darcs/Repository.hs 291
>    let toRepo :: Repository p C(r u r) -- In empty repo, t(entative) = r(ecorded)
>        toRepo = Repo toDir opts toFormat $ DarcsRepository toPristine toCache2
>        fromPacksDir = fromDir ++ "/" ++ darcsdir ++ "/packs/"
> -  createDirectoryIfMissing False $ toDir </> darcsdir </> "inventories"
> -  createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
> -  createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
> +  createDirectoryIfMissing False $ darcsdir </> "inventories"
>    copySources toRepo fromDir
>    Repo _ _ _ (DarcsRepository _ toCache3) <-
>      identifyRepositoryFor toRepo "."
> hunk ./src/Darcs/Repository.hs 295
> -  let
> -    cs = case toCache3 of
> -      Ca cs' -> catMaybes . flip map cs' $ \x -> case x of
> -        Cache Directory Writable x' -> Just x'
> -        _ -> Nothing
> -    ca = if not (null cs) then Just (head cs) else Nothing
>    -- unpack inventory & pristine cache
> hunk ./src/Darcs/Repository.hs 296
> -  procBasicTar ca . Tar.read $ decompress b
> +  cleanDir "pristine.hashed"
> +  procBasic toCache3 . Tar.read $ decompress b
>    createPristineDirectoryTree toRepo "."
>    -- pull new patches
>    us <- readRepo toRepo
> hunk ./src/Darcs/Repository.hs 311
>      applyToWorking toRepo opts pw
>      return ()
>    -- get old patches
> +  cleanDir "patches"
>    unless (any (`elem` opts) [Partial, Lazy, Ephemeral]) $ do
>      putInfo "Copying patches, to get lazy repository hit ctrl-C..."
> hunk ./src/Darcs/Repository.hs 314
> -    procPatches ca . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir ++
> -      "patches.tar.gz") Uncachable
> +    mv <- newEmptyMVar
> +    _ <- forkIO . flip finally (putMVar mv ()) .
> +      fetchFiles toCache3 HashedPatchesDir . mapFL hashedPatchFileName $
> +      newset2FL us
> +    procPatches toCache3 . Tar.read . decompress =<<
> +      fetchFileLazyPS (fromPacksDir ++ "patches.tar.gz") Uncachable
> +    takeMVar mv
>   where
> hunk ./src/Darcs/Repository.hs 322
> -  procBasicTar ca = procHashedInv >=> procTmp >=> procFiles ca
> -  procPatches = procFiles
> -  procHashedInv Tar.Done = fail
> +  procBasic = procHashedInv
> +  procPatches ca = procFiles $ cacheDir ca
> +  procHashedInv _ Tar.Done = fail
>      "Unexpected end of file; hashed_inventory expected"
> hunk ./src/Darcs/Repository.hs 326
> -  procHashedInv (Tar.Next x xs) = withTarFile x $ \p c ->
> +  procHashedInv ca (Tar.Next x xs) = withTarFile x $ \p c ->
>      if "hashed_inventory" == takeFileName p
>        then do
>          writeFile' Nothing p c
> hunk ./src/Darcs/Repository.hs 330
> -        return xs
> +        procTmp ca xs
>        else fail $ "Unexpected file: " ++ takeFileName p ++
>          "\nhashed_inventory expected"
> hunk ./src/Darcs/Repository.hs 333
> -  procHashedInv (Tar.Fail e) = fail e
> -  procTmp Tar.Done = return Tar.Done
> -  procTmp xxs@(Tar.Next x xs) = withTarFile x $ \p c ->
> -    if "tmp-" `isPrefixOf` p
> +  procHashedInv _ (Tar.Fail e) = fail e
> +  procTmp _ Tar.Done = return ()
> +  procTmp ca xxs@(Tar.Next x xs) = withTarFile x $ \p c ->
> +    if "tmp-" `isPrefixOf` takeFileName p
>        then do
>          BL.writeFile p c
> hunk ./src/Darcs/Repository.hs 339
> -        procTmp xs
> +        procTmp ca xs
>        else do
> hunk ./src/Darcs/Repository.hs 341
> -        mapM removeFile . filter ("tmp-" `isPrefixOf`) =<<
> -          getDirectoryContents "."
> -        return xxs
> -  procTmp (Tar.Fail e) = fail e
> +        ex <- and <$> mapM doesFileExist
> +          [ darcsdir </> "tmp-filelist-pristine"
> +          , darcsdir </> "tmp-filelist-inventories"
> +          ]
> +        if ex
> +          then do
> +            mv <- newEmptyMVar
> +            _ <- forkIO . flip finally (putMVar mv ()) $ do
> +              fetchFiles ca HashedInventoriesDir . lines =<<
> +                readFile (darcsdir </> "tmp-filelist-inventories")
> +              fetchFiles ca HashedPristineDir . lines =<<
> +                readFile (darcsdir </> "tmp-filelist-pristine")
> +            procFiles (cacheDir ca) xxs
> +            takeMVar mv
> +          else procFiles (cacheDir ca) xxs
> +        mapM_ removeFile . (map (darcsdir </>)) .
> +          filter (("tmp-" `isPrefixOf`) . takeFileName) =<<
> +          getDirectoryContents darcsdir
> +  procTmp _ (Tar.Fail e) = fail e
>    procFiles _ Tar.Done = return ()
>    procFiles ca (Tar.Next x xs) = withTarFile x $ \p c -> do
> hunk ./src/Darcs/Repository.hs 362
> -    writeFile' ca p $ compress c
> -    procFiles ca xs
> +    ex <- doesFileExist p
> +    if ex
> +      then debugMessage $ "Tar thread: STOP " ++ p
> +      else do
> +        writeFile' ca p $ compress c
> +        debugMessage $ "Tar thread: GET " ++ p
> +        procFiles ca xs
>    procFiles _ (Tar.Fail e) = fail e
>    withTarFile x f = case Tar.entryContent x of
>      Tar.NormalFile x' _ -> f (Tar.entryPath x) x'
> hunk ./src/Darcs/Repository.hs 389
>      createDirectoryIfMissing True $ takeDirectory y
>      createLink x y `catchall` return ()
>    putInfo = when (not $ Quiet `elem` opts) . putStrLn
> -
> +  fetchFiles _ _ [] = return ()
> +  fetchFiles c d (f:fs) = do
> +    ex <- doesFileExist $ darcsdir </> hashedDir d </> f
> +    if ex
> +      then debugMessage $ "Cache thread: STOP " ++
> +        (darcsdir </> hashedDir d </> f)
> +      else do
> +        debugMessage $ "Cache thread: GET " ++
> +          (darcsdir </> hashedDir d </> f)
> +        fetchFileUsingCache c d f
> +        fetchFiles c d fs
> +  hashedPatchFileName x = case extractHash x of
> +    Left _ -> fail "unexpected unhashed patch"
> +    Right h -> h
> +  cacheDir (Ca cs) = let
> +    cs' = catMaybes . flip map cs $ \x -> case x of
> +      Cache Directory Writable x' -> Just x'
> +      _ -> Nothing
> +   in
> +    if not (null cs') then Just (head cs') else Nothing
> +  cleanDir d = mapM_ (\x -> removeFile $ darcsdir </> d </> x) .
> +    filter (\x -> head x /= '.') =<< getDirectoryContents (darcsdir </> d)
> + 
>  -- | writePatchSet is like patchSetToRepository, except that it doesn't
>  -- touch the working directory or pristine cache.
>  writePatchSet :: RepoPatch p => PatchSet p C(Origin x) -> [DarcsFlag] -> IO (Repository p C(r u t))

Ok, I can accept this as well. The "tmp-" to "meta-" rename can be done
as a patch on top these 3 in this bundle.

Minimize the number of packed inventories
-----------------------------------------

Looks reasonable. I guess the below code may error out if the current
repository is lazy (due to those (Ca []) bits) but this is not much of a
problem. We can polish the error this would give later.

> hunk ./src/Darcs/Commands/Optimize.lhs 383
>    BL.writeFile (patchesTar <.> "part") . compress . write =<<
>      mapM fileEntry' ps
>    renameFile (patchesTar <.> "part") patchesTar
> -  is <- sortByMTime =<< dirContents "inventories"
> +  is <- map ((darcsdir </> "inventories") </>) <$> HashedRepo.listInventories
>    writeFile (darcsdir </> "tmp-filelist-inventories") . unlines $
>      map takeFileName is
>    pr <- sortByMTime =<< dirContents "pristine.hashed"
> hunk ./src/Darcs/Repository/HashedRepo.hs 29
>                                       addToTentativeInventory, removeFromTentativeInventory,
>                                       readRepo, readTentativeRepo, writeAndReadPatch,
>                                       writeTentativeInventory, copyRepo,
> -                                     readHashedPristineRoot, pris2inv, copySources
> +                                     readHashedPristineRoot, pris2inv, copySources,
> +                                     listInventories
>                                     ) where
>  
>  import System.Directory ( createDirectoryIfMissing )
> hunk ./src/Darcs/Repository/HashedRepo.hs 39
>  import System.IO ( stderr, hPutStrLn )
>  import Data.List ( delete, filter )
>  import Control.Monad ( unless )
> +import Control.Applicative ( (<$>) )
>  
>  import Workaround ( renameFile )
>  import Darcs.Flags ( DarcsFlag, Compression, RemoteDarcs )
> hunk ./src/Darcs/Repository/HashedRepo.hs 404
>                    _ -> return ([],i)
>      return $ reverse (readPatchIds str) : rest
>  
> +listInventories :: IO [String]
> +listInventories = do
> +  x <- fst <$> readInventoryPrivate (Ca []) darcsdir "hashed_inventory"
> +  case x of
> +    Nothing -> return []
> +    Just x' -> f x'
> + where
> +  f i = do
> +    x <- fst <$> readInventoryPrivate (Ca []) (darcsdir </> "inventories") i
> +    (i :) <$> case x of
> +      Nothing -> return []
> +      Just x' -> f x'
> +
>  readPatchIds :: B.ByteString -> [(PatchInfo, String)]
>  readPatchIds inv | B.null inv = []
>  readPatchIds inv = case readPatchInfo inv of
msg11951 (view) Author: mornfall Date: 2010-08-04.19:13:56
Hi,

during a short test, I have noticed that the current optimize --http
also includes extra pristine files (the same problem as with inventories
which you solved in your last patch). A simple solution would be to
clean the pristine dir (just like "darcs optimize" does) before building
the basic tarball. (Another approach would be similar to your
listInventories fix.)

You may also want to look at patch326 to help you out with testing. (To
see what was going on, I tacked a --debug to the get and exit 1 right
after it in the get-http-packed test. You also need to install lighttpd
for the tests to work.)

Nevertheless, it seems that the code works OK and it's not introducing
any test failures, so I am going to push now (together with a patch that
re-enables your code on HEAD).

Yours,
   Petr.
msg11952 (view) Author: darcswatch Date: 2010-08-04.20:14:00
This patch bundle (with 3 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a69330c8c7e8a94daaa67393f0c402c86ea64a28
msg14363 (view) Author: darcswatch Date: 2011-05-10.21:36:46
This patch bundle (with 3 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-a69330c8c7e8a94daaa67393f0c402c86ea64a28
History
Date User Action Args
2010-07-03 08:48:12exlevancreate
2010-07-03 08:49:04darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-0acab7625bea5064cb5422350264b590fa9569bd
2010-07-04 19:12:49mornfallsetnosy: + mornfall
messages: + msg11678
2010-07-06 19:40:54ganeshsetassignedto: mornfall
2010-07-06 20:37:12mornfallsetstatus: needs-review -> in-discussion
2010-07-14 11:01:07exlevansetmessages: + msg11742
2010-07-28 17:52:59koweysetassignedto: mornfall -> exlevan
2010-08-02 05:05:29exlevansetstatus: in-discussion -> needs-review
files: + hardlink-files-while-getting-a-packed-repository.dpatch, unnamed
messages: + msg11910
2010-08-02 05:08:12darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-0acab7625bea5064cb5422350264b590fa9569bd -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a69330c8c7e8a94daaa67393f0c402c86ea64a28
2010-08-03 17:26:21koweysetassignedto: exlevan -> mornfall
messages: + msg11930
nosy: + kowey
2010-08-04 18:10:56mornfallsetstatus: needs-review -> accepted-pending-tests
messages: + msg11945
2010-08-04 19:13:57mornfallsetmessages: + msg11951
2010-08-04 20:14:00darcswatchsetstatus: accepted-pending-tests -> accepted
messages: + msg11952
2011-05-10 21:36:19darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a69330c8c7e8a94daaa67393f0c402c86ea64a28 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-0acab7625bea5064cb5422350264b590fa9569bd
2011-05-10 21:36:46darcswatchsetmessages: + msg14363