darcs

Patch 1180 Garbage collection of global cache.

Title Garbage collection of global cache.
Superseder Nosy List ganesh, mdiaz
Related Issues
Status accepted Assigned To mdiaz
Milestone

Created on 2014-07-16.12:50:36 by mdiaz, last changed 2014-10-21.14:16:35 by gh.

Files
File name Status Uploaded Type Edit Remove
fixed-file-does-not-exist-error-when-using-darcs-optimize-global_cache_.dpatch mdiaz, 2014-07-29.11:03:57 application/x-darcs-patch
fixed-file-does-not-exist-error-when-using-darcs-optimize-global_cache_.dpatch mdiaz, 2014-07-29.12:22:28 application/x-darcs-patch
followup-work-on-garbage-collecting-of-global-cache.dpatch gh, 2014-10-14.04:15:51 application/x-darcs-patch
followup-work-on-garbage-collecting-of-global-cache.dpatch gh, 2014-10-14.12:19:52 application/x-darcs-patch
garbage-collection-of-global-cache_.dpatch mdiaz, 2014-07-16.12:50:36 application/x-darcs-patch
garbage-collection-of-global-cache_.dpatch mdiaz, 2014-07-18.15:33:05 application/x-darcs-patch
garbage-collection-of-global-cache_.dpatch mdiaz, 2014-07-23.13:41:46 application/x-darcs-patch
garbage-collection-of-global-cache_.dpatch mdiaz, 2014-07-23.14:10:08 application/x-darcs-patch
garbage-collection-of-global-cache_.dpatch mdiaz, 2014-07-23.17:32:23 application/x-darcs-patch
optimize-cache-help-string.dpatch gh, 2014-10-21.14:14:38 application/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-16.12:50:36 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-18.15:33:05 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-23.13:41:46 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-23.14:10:08 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-23.17:32:23 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-29.11:03:57 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-29.12:22:28 text/x-darcs-patch
patch-preview.txt gh, 2014-10-14.04:15:51 text/x-darcs-patch
patch-preview.txt gh, 2014-10-14.12:19:52 text/x-darcs-patch
patch-preview.txt gh, 2014-10-21.14:14:38 text/x-darcs-patch
unnamed mdiaz, 2014-07-16.12:50:36
unnamed mdiaz, 2014-07-17.20:13:49 text/html
unnamed mdiaz, 2014-07-18.15:33:05
unnamed mdiaz, 2014-07-23.13:41:46
unnamed mdiaz, 2014-07-23.14:10:08
unnamed mdiaz, 2014-07-23.17:32:23
unnamed mdiaz, 2014-07-29.11:03:57
unnamed mdiaz, 2014-07-29.12:22:28
unnamed gh, 2014-10-14.04:15:51
unnamed gh, 2014-10-14.12:19:52
unnamed gh, 2014-10-21.14:14:38
See mailing list archives for discussion on individual patches.
Messages
msg17610 (view) Author: mdiaz Date: 2014-07-16.12:50:36
This patch adds the global-cache subcommand to the optimize command.
This subcommand takes as arguments directories, search repositories 
within those directories, and delete files in the global cache 
that are not used for these repositories.

The patch still lacks some details, for example:
- Generalizing functions such as: getRecursiveContents, 
getRecursiveDarcsRepos in the module Commands.Optimize, 
and getRecursiveContentsFullPath in the module Util.File.
- What should happen if I run the command without directories as arguments.
- Help string.

But I send it so you can start reviewing it.

1 patch for repository http://darcs.net/screened:

Wed Jul 16 09:04:20 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * Garbage collection of global cache.
Attachments
msg17611 (view) Author: ganesh Date: 2014-07-16.23:13:18
Some partial comments:

> hunk ./src/Darcs/Repository/HashedRepo.hs 670
>  -- with inventory" hashes. This function does not attempt to download
missing
>  -- inventory files.
> -listPatchesLocal :: IO [String]
> -listPatchesLocal = do
> -    inventory <- readInventoryPrivate darcsdir hashedInventory
> +listPatchesLocal :: String -> IO [String]
> +listPatchesLocal darcsDir = do
> +    inventory <- readInventoryPrivate darcsDir hashedInventory
>      followStartingWiths (fst inventory) (getPatches inventory)
>      where
>          followStartingWiths Nothing patches = return patches

How does this interact with lazy repositories, as we discussed last week?

I think we also discussed an approach of just GCing the user
repositories first, then GCing the global cache. Which algorithm is
being used in this patch?

> hunk ./src/Darcs/Repository/HashedRepo.hs 658
>      if b then readInventoryPrivate dir invName
>          else return (Nothing, [])
>
> -
>  -- |listInventoriesLocal this function does not attempt to retrieve
missing
>  -- inventory files.
>  listInventoriesLocal :: IO [String]

Please amend out this unnecessary whitespace change (adds noise, risk of
conflicts with other changes).

> +-- | getRecursiveDarcsRepos returns all directories under topdir that
have
> +-- a child directory named darcsdir ("_darcs").
> +getRecursiveDarcsRepos :: FilePath -> IO [FilePath]
> +getRecursiveDarcsRepos topdir = do
> +  names <- getDirectoryContents topdir
> +  let properNames = filter (`notElem` [".", ".."]) names
> +  paths <- forM properNames $ \name -> do
> +    let path = topdir </> name
> +    isDir <- doesDirectoryExist path
> +    if isDir && (name /= darcsdir)
> +      then getRecursiveDarcsRepos path
> +      else if name == darcsdir

This logic feels a bit convoluted. Wouldn't it lead to a file named
"_darcs" being recognised? I think the outer if should check 'isDir',
and the inner if check 'name == darcsdir'
msg17613 (view) Author: mdiaz Date: 2014-07-17.20:13:49
On Wed, Jul 16, 2014 at 8:13 PM, Ganesh Sittampalam <bugs@darcs.net> wrote:

>
> Ganesh Sittampalam <ganesh@earth.li> added the comment:
>
> Some partial comments:
>
> > hunk ./src/Darcs/Repository/HashedRepo.hs 670
> >  -- with inventory" hashes. This function does not attempt to download
> missing
> >  -- inventory files.
> > -listPatchesLocal :: IO [String]
> > -listPatchesLocal = do
> > -    inventory <- readInventoryPrivate darcsdir hashedInventory
> > +listPatchesLocal :: String -> IO [String]
> > +listPatchesLocal darcsDir = do
> > +    inventory <- readInventoryPrivate darcsDir hashedInventory
> >      followStartingWiths (fst inventory) (getPatches inventory)
> >      where
> >          followStartingWiths Nothing patches = return patches
>
> How does this interact with lazy repositories, as we discussed last week?
>

It doesn't interact with lazy repositories. I just realized that we can only
read the hashed_inventory file from the repository and then read the other
inventories from the global cache.


> I think we also discussed an approach of just GCing the user
> repositories first, then GCing the global cache. Which algorithm is
> being used in this patch?
>

This patch doesn't garbage collect the repositories. I don't like the idea
of ​​doing two things at the same time, and there is a command that does
that.


> > hunk ./src/Darcs/Repository/HashedRepo.hs 658
> >      if b then readInventoryPrivate dir invName
> >          else return (Nothing, [])
> >
> > -
> >  -- |listInventoriesLocal this function does not attempt to retrieve
> missing
> >  -- inventory files.
> >  listInventoriesLocal :: IO [String]
>
> Please amend out this unnecessary whitespace change (adds noise, risk of
> conflicts with other changes).
>
>
Ok.


> > +-- | getRecursiveDarcsRepos returns all directories under topdir that
> have
> > +-- a child directory named darcsdir ("_darcs").
> > +getRecursiveDarcsRepos :: FilePath -> IO [FilePath]
> > +getRecursiveDarcsRepos topdir = do
> > +  names <- getDirectoryContents topdir
> > +  let properNames = filter (`notElem` [".", ".."]) names
> > +  paths <- forM properNames $ \name -> do
> > +    let path = topdir </> name
> > +    isDir <- doesDirectoryExist path
> > +    if isDir && (name /= darcsdir)
> > +      then getRecursiveDarcsRepos path
> > +      else if name == darcsdir
>
> This logic feels a bit convoluted. Wouldn't it lead to a file named
> "_darcs" being recognised? I think the outer if should check 'isDir',
> and the inner if check 'name == darcsdir'
>
>
The idea is to recognize the parents of "_darcs" directories.


> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/patch1180>
> __________________________________
>
Attachments
msg17615 (view) Author: mdiaz Date: 2014-07-18.15:33:05
Now the patch interact with lazy repos, it only read
the hashed_inventory from the repos. Other inventories, 
patches and pristine files are read from the global cache.

Also, I fixed the convoluted logic of the function getRecursiveDarcsRepos.

1 patch for repository http://darcs.net/screened:

Fri Jul 18 12:26:52 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * Garbage collection of global cache.
Attachments
msg17616 (view) Author: ganesh Date: 2014-07-20.23:15:49
Looks good - what's left to do (if anything)?
msg17623 (view) Author: mdiaz Date: 2014-07-23.13:41:46
Changes from last patch:
- Commented several functions in the Repository.HashedRepo module.
- New description of the optimize global-cache command (added optimizeHelpGlobalCache variable).
- Moved getRecursiveDarcsRepos from UI.Commands.Optimize to Repository.Util.
- Moved getRecursiveContents from UI.Commands.Optimize to Util.File.

1 patch for repository http://darcs.net/screened:

Wed Jul 23 10:39:21 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * Garbage collection of global cache.
Attachments
msg17624 (view) Author: mdiaz Date: 2014-07-23.14:10:08
Changes:
- Without arguments returns a message saying that the command does nothing.

Ganesh: I think there is nothing more to do, except maybe a test.
1 patch for repository http://darcs.net/screened:

Wed Jul 23 11:07:00 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * Garbage collection of global cache.
Attachments
msg17626 (view) Author: mdiaz Date: 2014-07-23.17:32:23
Fixed some typo:
- Changed "hashed_file" for "hashed_inventory" 
in the documentation of some functions. 
1 patch for repository http://darcs.net/screened:

Wed Jul 23 14:30:05 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * Garbage collection of global cache.
Attachments
msg17629 (view) Author: ganesh Date: 2014-07-28.18:46:18
I'm pushing this to screened now so please send updates as followup 
patches.

I tried it and it failed with:

$ dist/build/darcs/darcs optimize global-cache ../..
darcs.exe: ../../temp-rebase-extra/_darcs/hashed_inventory: 
openBinaryFile: does not exist (No such file or directory)

I think it's because I have a corrupt repo in my home directory, but the 
code should be tolerant to that. Could you fix that?
msg17630 (view) Author: mdiaz Date: 2014-07-29.11:03:57
Changes:
- Function maybeIdentifyRepository is now used to check the repositories.
- Directories inventories, patches and pristine.hashed are created if don't exist.

1 patch for repository http://darcs.net/screened:

patch 3590f61fba6302ee71d8aec9951b493d5b689c7c
Author: Marcio Diaz <marcio.diaz@gmail.com>
Date:   Tue Jul 29 07:45:47 ART 2014
  * Fixed file does not exist error when using darcs optimize global-cache.
Attachments
msg17631 (view) Author: mdiaz Date: 2014-07-29.12:22:28
Changes:
- I made several changes to avoid unnecessary recursions of the function getRecursiveDarcsRepos.
Now the variable repoPaths only contains different paths.

1 patch for repository http://darcs.net/screened:

patch 3a4431c09ef354dbbe2a140659299648f291ce13
Author: Marcio Diaz <marcio.diaz@gmail.com>
Date:   Tue Jul 29 09:17:24 ART 2014
  * Fixed file does not exist error when using darcs optimize global-cache.
Attachments
msg17644 (view) Author: gh Date: 2014-08-05.15:48:09
I've screened this last patch.

I'd like to see both subcommands:

  cache                optimize global cache for access through shell
commands like ls/rm
  global-cache         garbage collect global cache


merged into one. I think we can stay with the name "cache", and just
describe it as "garbage collect global cache".  The transition from
non-bucketed to bucketed is something that happens once, so I don't feel
it deserves a separate subcommand. If someone thinks of particular cases
where for performance or testing sake we want to keep them separated,
shout asap.
msg17654 (view) Author: gh Date: 2014-08-15.21:56:17
Also I think users will expect the command to work by default by taking
their home directory.

I mean, without passing any argument, it should print something like
"Will clean global cache according to repositories present in $HOME (<-
expand this)", and proceed.
msg17683 (view) Author: gh Date: 2014-10-08.15:27:27
Marcio, yell here if you are still working on this. I may be writing
the followup patches myself soon.
msg17687 (view) Author: gh Date: 2014-10-14.04:15:51
Here's a first attempt at a followup. I'm not screening it
yet in case it lacks something.

1 patch for repository http://darcs.net:

patch 8ed115f087e9356a6dd9c5768341ff76d5bc97d3
Author: Guillaume Hoffmann <guillaumh@gmail.com>
Date:   Mon Oct 13 20:23:22 ART 2014
  * followup work on garbage collecting of global cache
  * do bucketting and garbage collection in single command
    'optimize cache'
  * no longer recurse in dot-directories when looking for
    repotories
  * no longer consider OF repositories
  * when no directory given look into home
Attachments
msg17688 (view) Author: gh Date: 2014-10-14.12:19:52
Screening this one.

1 patch for repository http://darcs.net:

patch c3f56af0d782e8ce1d9a32e303ac32e36004f25a
Author: Guillaume Hoffmann <guillaumh@gmail.com>
Date:   Tue Oct 14 09:18:32 ART 2014
  * followup work on garbage collecting of global cache
  * do bucketting and garbage collection in single command
    'optimize cache'
  * no longer recurse in dot-directories when looking for
    repotories
  * no longer consider OF repositories
  * when no directory given look into home
Attachments
msg17708 (view) Author: gh Date: 2014-10-21.14:14:38
I'm accepting the whole bundle with this help string patch.

1 patch for repository http://darcs.net:

patch 2df506c413629e43e44f62af9df42267bd3becd5
Author: Guillaume Hoffmann <guillaumh@gmail.com>
Date:   Tue Oct 21 11:13:36 ART 2014
  * optimize cache help string
Attachments
History
Date User Action Args
2014-07-16 12:50:36mdiazcreate
2014-07-16 23:13:18ganeshsetmessages: + msg17611
2014-07-17 20:13:49mdiazsetfiles: + unnamed
messages: + msg17613
2014-07-18 15:33:06mdiazsetfiles: + patch-preview.txt, garbage-collection-of-global-cache_.dpatch, unnamed
messages: + msg17615
2014-07-20 23:15:49ganeshsetassignedto: ganesh
messages: + msg17616
nosy: + ganesh
2014-07-23 13:41:46mdiazsetfiles: + patch-preview.txt, garbage-collection-of-global-cache_.dpatch, unnamed
messages: + msg17623
2014-07-23 14:10:09mdiazsetfiles: + patch-preview.txt, garbage-collection-of-global-cache_.dpatch, unnamed
messages: + msg17624
2014-07-23 17:32:23mdiazsetfiles: + patch-preview.txt, garbage-collection-of-global-cache_.dpatch, unnamed
messages: + msg17626
2014-07-28 18:46:18ganeshsetstatus: needs-screening -> needs-review
messages: + msg17629
2014-07-28 18:46:23ganeshsetstatus: needs-review -> followup-requested
assignedto: ganesh -> mdiaz
2014-07-29 11:03:58mdiazsetfiles: + patch-preview.txt, fixed-file-does-not-exist-error-when-using-darcs-optimize-global_cache_.dpatch, unnamed
messages: + msg17630
2014-07-29 12:22:28mdiazsetfiles: + patch-preview.txt, fixed-file-does-not-exist-error-when-using-darcs-optimize-global_cache_.dpatch, unnamed
messages: + msg17631
2014-08-05 15:48:10ghsetmessages: + msg17644
2014-08-15 21:56:17ghsetmessages: + msg17654
2014-10-08 15:27:27ghsetmessages: + msg17683
2014-10-14 04:15:51ghsetfiles: + patch-preview.txt, followup-work-on-garbage-collecting-of-global-cache.dpatch, unnamed
messages: + msg17687
2014-10-14 12:19:53ghsetfiles: + patch-preview.txt, followup-work-on-garbage-collecting-of-global-cache.dpatch, unnamed
messages: + msg17688
2014-10-14 12:20:48ghsetstatus: followup-requested -> needs-review
2014-10-21 14:14:38ghsetfiles: + patch-preview.txt, optimize-cache-help-string.dpatch, unnamed
messages: + msg17708
2014-10-21 14:16:35ghsetstatus: needs-review -> accepted