Created on 2014-04-06.17:56:05 by mdiaz, last changed 2014-04-10.14:03:08 by gh.
File name |
Status |
Uploaded |
Type |
Edit |
Remove |
accept-issue1987_-regression-tests-for-garbage-collection-for.dpatch
|
|
mdiaz,
2014-04-09.01:27:37
|
application/x-darcs-patch |
|
|
accept-issue1987_-regression-tests-for-garbage-collection-for.dpatch
|
|
mdiaz,
2014-04-10.01:53:27
|
application/x-darcs-patch |
|
|
patch-preview.txt
|
|
mdiaz,
2014-04-06.17:56:05
|
text/x-darcs-patch |
|
|
patch-preview.txt
|
|
mdiaz,
2014-04-07.19:30:01
|
text/x-darcs-patch |
|
|
patch-preview.txt
|
|
mdiaz,
2014-04-08.10:40:49
|
text/x-darcs-patch |
|
|
patch-preview.txt
|
|
mdiaz,
2014-04-09.01:27:37
|
text/x-darcs-patch |
|
|
patch-preview.txt
|
|
mdiaz,
2014-04-10.01:53:27
|
text/x-darcs-patch |
|
|
resolve-issue1987_-garbage-collection-for-inventories-and-patches_.dpatch
|
|
mdiaz,
2014-04-06.17:56:05
|
application/x-darcs-patch |
|
|
resolve-issue1987_-garbage-collection-for-inventories-and-patches_.dpatch
|
|
mdiaz,
2014-04-07.19:30:01
|
application/x-darcs-patch |
|
|
resolve-issue1987_-garbage-collection-for-inventories-and-patches_.dpatch
|
|
mdiaz,
2014-04-08.10:40:49
|
application/x-darcs-patch |
|
|
unnamed
|
|
mdiaz,
2014-04-06.17:56:05
|
|
|
|
unnamed
|
|
mdiaz,
2014-04-07.19:30:01
|
|
|
|
unnamed
|
|
mdiaz,
2014-04-08.10:40:49
|
|
|
|
unnamed
|
|
mdiaz,
2014-04-09.01:27:37
|
|
|
|
unnamed
|
|
mdiaz,
2014-04-10.01:53:27
|
|
|
|
See mailing list archives
for discussion on individual patches.
msg17269 (view) |
Author: mdiaz |
Date: 2014-04-06.17:56:05 |
|
Updated [patch1128]: now I check that the inventory file exist
before trying to read it (function readSafeInventoryPrivate).
1 patch for repository http://darcs.net/screened:
Sun Apr 6 14:48:28 ART 2014 marcio.diaz@gmail.com
* resolve issue1987: Garbage collection for inventories and patches.
Attachments
|
msg17278 (view) |
Author: gh |
Date: 2014-04-07.18:21:20 |
|
Hi Marcio,
first, I recommend you to set your author string with the pattern
"Full Name <email@aaaa.aa>" since we have some scripts that catch that
format to generate the news bulletin. And it's more standard.
Also when sending an amended version of a patch, it is better to send
it with the same ticket number, in the present case it would have
been:
darcs send --mail --subject '[patch1128]'
Let's continue with that ticket now (1134).
Code review (I'm doing it in an order that makes more sense to me):
> hunk ./src/Darcs/UI/Commands/Optimize.hs 167
> - "\n" ++
> - "There is one more optimization which CAN NOT be performed by this\n" ++
> - "command. Every time your record a patch, a new inventory file is\n" ++
> - "written to `_darcs/inventories/`, and old inventories are never reaped.\n" ++
> - "\n" ++
> - "If `_darcs/inventories/` is consuming a relatively large amount of\n" ++
> - "space, you can safely reclaim it by using `darcs get` to make a\n" ++
> - "complete copy of the repo. When doing so, don't forget to copy over\n" ++
> - "any unsaved changes you have made to the working tree or to\n" ++
> - "unversioned files in `_darcs/prefs/` (such as `_darcs/prefs/author`).\n"
> + "\n"
> hunk ./src/Darcs/UI/Commands/Optimize.hs 198
> - cleanRepository repository -- garbage collect pristine.hashed directory
> + cleanRepository repository -- garbage collect pristine.hashed, inventories and patches directories
OK.
> hunk ./src/Darcs/Repository/Internal.hs 182
> + , cleanInventories
> + , cleanPatches
> hunk ./src/Darcs/Repository/Internal.hs 895
> - HvsO { hashed = HashedRepo.cleanPristine repository,
> + HvsO { hashed = cleanHashedRepo repository,
> hunk ./src/Darcs/Repository/Internal.hs 897
> + where
> + cleanHashedRepo r = do
> + HashedRepo.cleanPristine r
> + HashedRepo.cleanInventories r
> + HashedRepo.cleanPatches r
OK, I'm fine with cleaning everything at once.
> hunk ./src/Darcs/Repository/HashedRepo.hs 22
> + , cleanInventories
> + , cleanPatches
> hunk ./src/Darcs/Repository/HashedRepo.hs 57
> +import qualified Data.Set as Set
> hunk ./src/Darcs/Repository/HashedRepo.hs 64
> -import System.Directory ( createDirectoryIfMissing )
> +import System.Directory ( createDirectoryIfMissing, getDirectoryContents, doesFileExist )
> hunk ./src/Darcs/Repository/HashedRepo.hs 85
> + , removeFileMayNotExist
> hunk ./src/Darcs/Repository/HashedRepo.hs 140
> +patchesDir, patchesDirPath :: String
> +patchesDir = "patches"
> +patchesDirPath = makeDarcsdirPath patchesDir
> +
OK.
> hunk ./src/Darcs/Repository/HashedRepo.hs 241
> +-- |cleanInventories removes any obsolete (unreferenced) files in the
> +-- inventories directory
> +cleanInventories :: Repository p wR wU wT -> IO ()
> +cleanInventories _ = do
> + debugMessage "Cleaning out inventories..."
> + hs <- listInventories
This calls the function that returns you a list of inventory hashes,
jumping from one "Starting With:" to another, starting with the hash
given in hashed_inventory. This will give us the good inventory files,
the ones we want to keep.
> + fs <- filter okayHash <$> getDirectoryContents inventoriesDirPath
This lists the files of _darcs/inventories , only keeping those that
have the name length of a hashed file.
A side-effect of that is that we're not going to clean any random file
that could have been added manually (_darcs/inventories/SOME_FILE).
Maybe that's not a problem.
> + mapM_ (removeFileMayNotExist . (inventoriesDirPath </>))
> + (unset $ (set fs) `Set.difference` (set hs))
> + where set = Set.fromList . map BC.pack
> + unset = map BC.unpack . Set.toList
> +
And we're removing those files. Seems ok.
> +-- |cleanPatches removes any obsolete (unreferenced) files in the
> +-- patches directory
> +cleanPatches :: Repository p wR wU wT -> IO ()
> +cleanPatches _ = do
> + debugMessage "Cleaning out patches..."
> + hs <- listPatches
> + fs <- filter okayHash <$> getDirectoryContents patchesDirPath
> + mapM_ (removeFileMayNotExist . (patchesDirPath </>))
> + (unset $ (set fs) `Set.difference` (set hs))
> + where set = Set.fromList . map BC.pack
> + unset = map BC.unpack . Set.toList
> +
OK so what happens is that we only delete files that have the
"shape" of a hashed file (since we filter with `okayHash`).
Hence files like pending, pending.tentative, unrevert are not deleted, good.
The alternative would be to have a special "white list" of files we
don't want to delete,
and not use `okayHash`.
> hunk ./src/Darcs/Repository/HashedRepo.hs 457
> - inv <- skipPristine <$> gzFetchFilePS (dir </> invName) Uncachable
> - readInventoryFromContent (toPath dir ++ "/" ++ darcsdir ++ invName) inv
> + inv <- skipPristine <$> gzFetchFilePS (dir </> invName) Uncachable
> + readInventoryFromContent (toPath dir ++ "/" ++ darcsdir ++ invName) inv
> +
You can remove this hunk with `darcs amend --unrecord`.
> hunk ./src/Darcs/Repository/HashedRepo.hs 601
> - fst <$> readInventoryPrivate invDir inv
> + fst <$> readSafeInventoryPrivate invDir inv
> hunk ./src/Darcs/Repository/HashedRepo.hs 608
> + readSafeInventoryPrivate dir invName = do
> + b <- doesFileExist (dir </> invName)
> + if b then readInventoryPrivate dir invName
> + else return (Nothing, [])
> +
> +
You should define another function `listInventories`
that call `readSafeInventoryPrivate` because there are other
functions that call `listInventories` and you precisely want them
to retrieve abstent inventories, otherwise darcs believes that your
repository is almost empty... (this happens when
you're working in a lazy repository, the directory `_darcs/inventories`
is actually empty right after cloning).
You could call that separate function `listInventoriesLocal`,
and don't forget to update the Haddock comment.
Also I'm not sure about the name `readSafeInventoryPrivate` since
it's not a matter of safety really; you could call it
`readInventoryLocalPrivate`.
> +listPatches :: IO [String]
> +listPatches = do
> + inventory <- readSafeInventoryPrivate darcsdir hashedInventory
I'd also name it `listPatchesLocal` and put it a comment as for
`listInventories`.
Same comment as above about `readSafeInventoryPrivate`.
> + followStartingWiths (fst inventory) (getPatches inventory)
You're getting the hashes of the patches of hashed_inventory and
continuing with the next inventory in the "Starting With:" chain; good.
> + where
> + followStartingWiths Nothing patches = return patches
ok.
> + followStartingWiths (Just startingWith) patches = do
> + inv <- readSafeInventoryPrivate inventoriesDirPath startingWith
> + (patches++) <$> followStartingWiths (fst inv) (getPatches inv)
> +
> + getPatches inv = map snd (snd inv)
OK.
> +
> + readSafeInventoryPrivate dir invName = do
> + b <- doesFileExist (dir </> invName)
> + if b then readInventoryPrivate dir invName
> + else return (Nothing, [])
> +
Same remark as above.
|
msg17282 (view) |
Author: mdiaz |
Date: 2014-04-07.19:30:01 |
|
Thanks Guillaume, I updated the patch according to your comments.
1 patch for repository http://darcs.net/screened:
Mon Apr 7 16:14:05 ART 2014 marcio.diaz@gmail.com
* resolve issue1987: Garbage collection for inventories and patches.
Attachments
|
msg17283 (view) |
Author: gh |
Date: 2014-04-07.19:58:57 |
|
Good!
You could add to the haddock comments of functions
`listInventoriesLocal` and `listPatchesLocal`, an extra sentence saying
"This function does not attempt to download missing inventory files."
Now can you send, in a separate patch, a shell script that tests that
unreferenced files in directories
`_darcs/{pristine.hashed|inventories|patches}` are properly cleaned?
|
msg17284 (view) |
Author: gh |
Date: 2014-04-07.19:59:20 |
|
(In a separate patch, but in the same bundle, if that wasn't clear.)
|
msg17285 (view) |
Author: simon |
Date: 2014-04-07.22:33:31 |
|
I think this is needed quite badly by darcs hub, where multiple repos
are accumulating duplicate patch files excessively.
Can this patch help clean up existing repos, as well as prevent the
build-up in new ones ? How should I test it ?
|
msg17286 (view) |
Author: gh |
Date: 2014-04-07.22:44:30 |
|
Simon, the current behaviour of darcs is to clean reposirories only when
the command “optimize” is run, see <http://bugs.darcs.net/issue687>,
which ended with the following patch by David Roundy:
" resolve issue687: don't clean out pristine cache except when optimizing.
The trouble is that we can never tell when someone might be waiting to grab
one of these older files. Atomicity doesn't gain us much if we drop the
old files immediately. The downside is that now repositories will
monotonically grow until an optimize is performed. :("
|
msg17288 (view) |
Author: simon |
Date: 2014-04-07.23:00:13 |
|
Nice link.
I don't see any reduction in the number of files in patches/ after
running darcs optimize or darcs optimize --pristine.
|
msg17289 (view) |
Author: simon |
Date: 2014-04-07.23:35:44 |
|
Correction - after applying this patch, darcs optimize *does* clean
patches/ . Hurrah!
for d in _darcs/{inventories,patch_index,patches}; do find $d | wc -l; done
darcs2.9.8+1134 optimize
for d in _darcs/{inventories,patch_index,patches}; do find $d | wc -l; done
262
5
35837
Done optimizing!
13
5
2454
Here are the overall repo sizes:
cd ..; du -s *
305532 repo-original
33024 repo-optimized-with-patch1134
15540 repo-copied-with-get
This is most helpful, even though the optimized copy is still reported
as twice as large as a freshly got copy. Looking more closely with du
shows only a 5M difference between their respective _darcs directories,
and both have the same number of files. I assume this must be some kind
of normal filesystem/du reporting issue.
So.. how safe is it to run darcs optimize with this patch frequently on
all darcs hub repos ?
|
msg17290 (view) |
Author: mdiaz |
Date: 2014-04-08.10:40:49 |
|
Here goes the updated patch and the corresponding regression test.
2 patches for repository http://darcs.net/screened:
Mon Apr 7 18:16:43 ART 2014 Marcio Diaz <marcio.diaz@gmail.com>
* resolve issue1987: Garbage collection for inventories and patches.
Tue Apr 8 07:38:38 ART 2014 Marcio Diaz <marcio.diaz@gmail.com>
* Accept issue1987: regression tests for garbage collection for
inventories and patches.
Attachments
|
msg17292 (view) |
Author: simon |
Date: 2014-04-08.13:23:25 |
|
+1. Very clear and informative.
-Simon
|
msg17293 (view) |
Author: owst |
Date: 2014-04-08.14:24:41 |
|
Nice patch! A few comments:
I think there is a large amount of duplication between
cleanPatches/cleanInventories - can you abstract over the differences
and reduce duplication?
Why do you convert the [FilePath] to [ByteString] before converting to a
set, only to convert back after you've taken the intersection?
I think using a white list of files rather than okayHash would make the
intent more obvious.
|
msg17296 (view) |
Author: gh |
Date: 2014-04-08.15:46:54 |
|
> I think using a white list of files rather than okayHash would make the
> intent more obvious.
I just realized that Darcs.UI.Commands.Optimize.doOptimizeHTTP
actually did that :-)
So Marcio, I suggest that this list of exceptions gets defined in
Darcs.Repository.HashedRepo (as, say, 'specialPatches') and used in
both doOptimizeHTTP and cleanPatches. Also, what happens if you also
stop using okayHash for cleaning _darcs/inventories/ ?
|
msg17298 (view) |
Author: gh |
Date: 2014-04-08.17:40:07 |
|
> I just realized that Darcs.UI.Commands.Optimize.doOptimizeHTTP
> actually did that :-)
Sorry, that was true at the time doOptimizeHTTP was introduced but
currently no more.
The code was:
ps <- dirContents' "patches" $ \x -> all (x /=) ["unrevert", "pending",
"pending.tentative"]
|
msg17306 (view) |
Author: mdiaz |
Date: 2014-04-09.01:27:37 |
|
New patches!, I hope they are better than the previous.
2 patches for repository http://darcs.net/screened:
Tue Apr 8 22:12:18 ART 2014 Marcio Diaz <marcio.diaz@gmail.com>
* Accept issue1987: regression tests for garbage collection for
inventories and patches.
Tue Apr 8 22:12:56 ART 2014 Marcio Diaz <marcio.diaz@gmail.com>
* resolve issue1987: Garbage collection for inventories and patches.
Attachments
|
msg17314 (view) |
Author: gh |
Date: 2014-04-09.18:24:47 |
|
I think we're almost done:
* In `listPatchesLocal`, I'd rather you used `readInventoryPrivate`
instead of `readInventoryLocalPrivate`. The reason is that
`_darcs/hashed_inventory` should *always* be present in a repository.
In the contrary case, I'd rather have everything blow up sooner than
later.
* Owen's question: "Why do you convert the [FilePath] to [ByteString]
before converting to a
set, only to convert back after you've taken the intersection?"
* Shell test: good!
|
msg17315 (view) |
Author: owst |
Date: 2014-04-09.18:53:20 |
|
Further to that: both cleanInventories and cleanPatches define
where-bound set/unset expressions. Make them top level if they're really
needed.
Also, you use patchesDir and patchesDirPath, but inside
filterDirContents, you pass (darcsdir </> d), which presumably does the
same thing as makeDarcsDirPath, which is used to make patchesDirPath
different to patchesDir?
|
msg17321 (view) |
Author: mdiaz |
Date: 2014-04-10.01:53:27 |
|
Changes:
* makeDarcsdirPath now uses '</>' instead of '++ "/" ++',
I don't know if this is correct.
* I replaced 'darcsdir </> d' by 'makeDarcsdirPath d'.
* The functions set/unset were moved to the top level.
They may be needed because byteStrings makes optimize faster
(at least a fraction of second over my darcs repo with
70 inventories and 10k patches).
Also, Set.difference is a lot faster than \\ (a few seconds).
See Petr patch:
http://article.gmane.org/gmane.comp.version-control.darcs.user/21206/.
2 patches for repository http://darcs.net/screened:
Tue Apr 8 22:12:18 ART 2014 Marcio Diaz <marcio.diaz@gmail.com>
* Accept issue1987: regression tests for garbage collection for
inventories and patches.
Wed Apr 9 19:30:09 ART 2014 Marcio Diaz <marcio.diaz@gmail.com>
* resolve issue1987: Garbage collection for inventories and patches.
Attachments
|
msg17323 (view) |
Author: gh |
Date: 2014-04-10.14:00:08 |
|
> * makeDarcsdirPath now uses '</>' instead of '++ "/" ++',
> I don't know if this is correct.
Yes it is, see <http://hackage.haskell.org/package/filepath-1.3.0.2/docs/System-FilePath-Posix.html#v:combine>
Thanks for the changes, bundle accepted!
|
|
Date |
User |
Action |
Args |
2014-04-06 17:56:05 | mdiaz | create | |
2014-04-07 18:21:21 | gh | set | messages:
+ msg17278 |
2014-04-07 19:30:01 | mdiaz | set | files:
+ patch-preview.txt, resolve-issue1987_-garbage-collection-for-inventories-and-patches_.dpatch, unnamed messages:
+ msg17282 |
2014-04-07 19:58:57 | gh | set | messages:
+ msg17283 |
2014-04-07 19:59:20 | gh | set | messages:
+ msg17284 |
2014-04-07 21:11:16 | simon | set | nosy:
+ simon |
2014-04-07 22:33:31 | simon | set | messages:
+ msg17285 |
2014-04-07 22:44:30 | gh | set | messages:
+ msg17286 |
2014-04-07 23:00:13 | simon | set | messages:
+ msg17288 |
2014-04-07 23:35:45 | simon | set | messages:
+ msg17289 |
2014-04-08 10:40:50 | mdiaz | set | files:
+ patch-preview.txt, resolve-issue1987_-garbage-collection-for-inventories-and-patches_.dpatch, unnamed messages:
+ msg17290 |
2014-04-08 13:23:26 | simon | set | messages:
+ msg17292 |
2014-04-08 14:24:42 | owst | set | messages:
+ msg17293 |
2014-04-08 15:46:54 | gh | set | messages:
+ msg17296 |
2014-04-08 17:40:08 | gh | set | messages:
+ msg17298 |
2014-04-09 01:27:37 | mdiaz | set | files:
+ patch-preview.txt, accept-issue1987_-regression-tests-for-garbage-collection-for.dpatch, unnamed messages:
+ msg17306 |
2014-04-09 18:24:47 | gh | set | messages:
+ msg17314 |
2014-04-09 18:53:20 | owst | set | messages:
+ msg17315 |
2014-04-10 01:53:28 | mdiaz | set | files:
+ patch-preview.txt, accept-issue1987_-regression-tests-for-garbage-collection-for.dpatch, unnamed messages:
+ msg17321 |
2014-04-10 14:00:09 | gh | set | messages:
+ msg17323 |
2014-04-10 14:03:08 | gh | set | status: needs-screening -> accepted |
|