darcs

Patch 1162 resolve issue1624: bucketed cache.

Title resolve issue1624: bucketed cache.
Superseder Nosy List ganesh, mdiaz
Related Issues
Status accepted Assigned To ganesh
Milestone

Created on 2014-05-24.10:28:02 by mdiaz, last changed 2014-07-16.11:57:57 by gh.

Files
File name Status Uploaded Type Edit Remove
patch-preview.txt mdiaz, 2014-05-24.10:28:01 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-06-24.16:04:38 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-06-25.00:48:36 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-14.21:36:00 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-15.15:14:31 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-15.15:42:56 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-15.19:11:52 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-16.02:33:31 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-16.08:50:49 text/x-darcs-patch
patch-preview.txt mdiaz, 2014-07-16.10:32:44 text/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-05-24.10:28:01 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-06-24.16:04:38 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-06-25.00:48:36 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-14.21:36:00 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-15.15:14:31 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-15.15:42:56 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-15.19:11:52 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-16.02:33:31 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-16.08:50:49 application/x-darcs-patch
resolve-issue1624_-bucketed-cache_.dpatch mdiaz, 2014-07-16.10:32:44 application/x-darcs-patch
unnamed mdiaz, 2014-05-24.10:28:01
unnamed mdiaz, 2014-06-24.16:04:38
unnamed mdiaz, 2014-06-25.00:48:36
unnamed mdiaz, 2014-07-14.21:36:00
unnamed mdiaz, 2014-07-15.14:47:16 text/html
unnamed mdiaz, 2014-07-15.15:14:31
unnamed mdiaz, 2014-07-15.15:42:56
unnamed mdiaz, 2014-07-15.19:11:52
unnamed mdiaz, 2014-07-16.02:33:31
unnamed mdiaz, 2014-07-16.08:50:49
unnamed mdiaz, 2014-07-16.10:32:44
See mailing list archives for discussion on individual patches.
Messages
msg17478 (view) Author: mdiaz Date: 2014-05-24.10:28:01
1 patch for repository http://darcs.net/screened:

Sat May 24 12:24:07 CEST 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17479 (view) Author: ganesh Date: 2014-05-24.10:38:25
Marking this as in-discussion for now as we need to see if it's worthwhile 
before definitely merging it.
msg17499 (view) Author: ganesh Date: 2014-05-30.21:54:23
As discussed, the most important next step is to get reproducible
benchmarks ready so other people can try out a bucketed cache in
different scenarios.

If it is worthwhile to switch, I think there's a bit more work to do to
implement the algorithm in http://bugs.darcs.net/msg10318 - if I
understand your changes correctly, currently the just migrate the cache
during darcs optimize and they don't do the hard linking.

A few comments on the specific changes inline. The rest of the code
itself looks fine.

> [resolve issue1624: bucketed cache.
> Marcio Diaz <marcio.diaz@gmail.com>**20140524102407
>  Ignore-this: 798f120b1afe803564c7b52976c6f045
>  
> ] hunk ./src/Darcs/Repository.hs 169
>                                , CacheLoc(..)
>                                , WritableOrNot(..)
>                                , hashedDir
> +                              , createBucket
>                                , CacheType(Directory)
>                                , reportBadSources
>                                )
> hunk ./src/Darcs/Repository.hs 657
>      BL.writeFile x' y
>      renameFile x' z
>    writeFile' (Just ca) z y = do
> -    let x' = joinPath . tail $ splitPath z -- drop darcsdir
> -    ex <- doesFileExist $ ca </> x'
> +    let splitedPath = splitPath z
> +        hDir = splitedPath !! 1 -- hashed directory
> +        hFile = splitedPath !! 2 -- hashed file

Can you express this with a case statement and pattern matching instead?
It's generally better style than explicitly indexing into a list.

Also 'splitedPath' sounds a bit funny in English - perhaps
'pathElements' or similar - though with a case statement you probably
wouldn't need to give it a name anyway.

> hunk ./src/Darcs/Repository/Cache.hs 212

> +createBucket :: String -> String
> +createBucket f = take 2 (cleanHash f)
> +    where
> +        cleanHash fileName = case dropWhile (/= '-') fileName of
> +            []  -> fileName
> +            s   -> drop 1 s
> +

The name of this function implies it will actually do something to the
filesystem. How about 'chooseBucket', 'bucketName', or 'bucketFolder' ?


> hunk ./tests/gzcrcs.sh 35
>  rm -rf maybench-crc
>  gunzip -c $TESTDATA/maybench-crc.tgz | tar xf -
>  cd maybench-crc
> -not darcs gzcrcs --check
> -darcs gzcrcs --repair
> -darcs gzcrcs --check
> +not darcs gzcrcs --check --debug
> +darcs gzcrcs --repair --debug
> +darcs gzcrcs --check --debug

Is this change a mistake?
msg17565 (view) Author: mdiaz Date: 2014-06-24.16:04:38
I updated the patch following the comments.

Now the behavior is as follows:

- darcs can read patches in the repositories located in ~/.darcs/cache/ and ~/.cache/darcs/ (if it follows the bucket format).
- While reading the patches in ~/.darcs/cache/ it binds them to the cache in ~/.cache/darcs/, using the bucket format.

I also added a new subcommand to darcs optimize, darcs optimize bucketed.
This command is used to migrate the cache in ~/.darcs/cache/ and the cache in ~/.cache/darcs/ (in non bucket format)
to ~/.cache/darcs/ in bucket format.

I think the behavior in Windows should be similar.

1 patch for repository http://darcs.net/screened:

Tue Jun 24 12:54:02 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17566 (view) Author: gh Date: 2014-06-24.19:06:47
Nice.

* GZCRCs: please remove these changes, they belong to another patch.
* Repository.hs: ok, adapting the code of tar unpacking
* Prefs: ok
* HashedRepo: ok

* Cache.hs:

the fact that we have both functions hashedFilePath and
hashedFilePathOld is a little hard to understand.

Two options: 1) Call the second one hashedFilePathReadOnly, and in the
first case of its definition add a comment like:

    -- if directory, assume it is non-bucketed cache (old cache location)

2) An alternative, maybe cleaner, is to add a third constructor to
CacheType, say DirectoryOld or DirectoryNonBucketed, make the function
getCaches use this constructor for the variable `oldGlobalcache` and
only have one function hashedFilePath (and do without hashedFilePathOld).

Now as you said, with your patch, files found in the old (non-bucketed)
location get automatically hard-linked into the new (bucketed) cache
location when they get accessed. This happens since darcs systematically
copies files from non-writable sources to writable sources. I guess the
logic is that a typical non-writable source is a remote repository
accessible by HTTP, and you want that files to be copied locally to the
cache.

It's not bad actually, since hard linking files does not use more space
on the hard drive and this makes the local caches more robust to
deleting the old cache directory. So I think it's good.

The only caveat I have is to make sure the function
fetchFileUsingCachePrivate does not systematically look into the old
cache *before* the new cache, otherwise its files will get linked over
and over again to the new cache. I'd update the function
`compareByLocality` with an extra case to compare writeability of cache
sources and ensure local writable < local non-writable.

As for `darcs optimize bucketed`, I'd simply call it `darcs optimize
cache`. Moreover you should make it more robust to the absence of either
inventories / pristine.hashed / patches subdirectories of ~/.darcs/cache
(currently it throws you an error).

Finally, for Windows, you should make that APPDATA\darcs\cache (I think
that's it) be considered as the old nonbucketed cache and
APPDATA\darcs\cache2 the new bucketed one.
msg17583 (view) Author: mdiaz Date: 2014-06-25.00:48:36
I updated the patche according to the previous comment.
I'm not sure about the changes corresponding to the cache path in Windows, I didn't test it yet.

1 patch for repository http://darcs.net/screened:

Tue Jun 24 21:33:29 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17592 (view) Author: gh Date: 2014-07-06.22:03:27
Sorry for taking so long to look at this patch, but I only have one last
complain. I'd rather have the toBucketed function to check the existence
of 'src' before doing anything, so that it does not create the directory
~/.darcs/cache again. Apart from this I would accept the patch.
msg17597 (view) Author: mdiaz Date: 2014-07-14.21:36:00
Added check for existence of src directory.
1 patch for repository http://darcs.net/screened:

Mon Jul 14 18:29:15 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17598 (view) Author: gh Date: 2014-07-15.14:44:42
One change I forgot about: let "darcs optimize cache" be runnable 
anywhere (ie outside of any repository). Have a look at the Init module 
to see how it's done.
msg17599 (view) Author: mdiaz Date: 2014-07-15.14:47:16
Oh, yes, I made the darcs optimize global-cache in that way.


On Tue, Jul 15, 2014 at 11:44 AM, Guillaume Hoffmann <bugs@darcs.net> wrote:

>
> Guillaume Hoffmann <guillaumh@gmail.com> added the comment:
>
> One change I forgot about: let "darcs optimize cache" be runnable
> anywhere (ie outside of any repository). Have a look at the Init module
> to see how it's done.
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/patch1162>
> __________________________________
>
Attachments
msg17600 (view) Author: mdiaz Date: 2014-07-15.15:14:31
1 patch for repository http://darcs.net/screened:

Tue Jul 15 12:08:30 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17601 (view) Author: mdiaz Date: 2014-07-15.15:42:57
1 patch for repository http://darcs.net/screened:

Tue Jul 15 12:40:02 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17602 (view) Author: mdiaz Date: 2014-07-15.19:11:52
1 patch for repository http://darcs.net/screened:

Tue Jul 15 16:10:45 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17603 (view) Author: mdiaz Date: 2014-07-16.02:33:31
1 patch for repository http://darcs.net/screened:

Tue Jul 15 23:32:16 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17604 (view) Author: gh Date: 2014-07-16.08:32:17
Please, *always* provide some explanation each time you send a new 
bundle.

For instance here's what could describe your last update:

"This makes the command "gzcrcs" work with bucketed caches, by 
recursively finding all files in all subdirectories of some hashed 
directory, so that it also find files inside of the bucket 
subdirectories (00, 01, 02...)."


Now, make sure your patch does not generate compilation warnings.

Also, in the function optimizeBucketedCmd, I believe that in the case 
where gCacheDir is Nothing, you should use "fail" instead of 
"debugMessage", since running "darcs optimize cache" when no global 
cache exists does not make sense. And the other case should end up with
msg17605 (view) Author: mdiaz Date: 2014-07-16.08:50:49
I changed debugMessage by fail and moved putInfo inside the Just case of 
gCacheDir.
1 patch for repository http://darcs.net/screened:

Wed Jul 16 05:48:21 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17606 (view) Author: gh Date: 2014-07-16.09:40:36
Good, but that putInfo message should happen after everything is done 
(new and old cache migration), as the operation can take some time. It 
may be also more readable to put the Nothing cases before the Just cases.

Also, what about the compile warnings?
msg17607 (view) Author: mdiaz Date: 2014-07-16.10:32:44
- Changed the putInfo message to the end of the migration.
- I put the Nothing cases before the Just cases.
- Removed unnecessary import that generate warnings.
1 patch for repository http://darcs.net/screened:

Wed Jul 16 07:24:01 ART 2014  Marcio Diaz <marcio.diaz@gmail.com>
  * resolve issue1624: bucketed cache.
Attachments
msg17608 (view) Author: gh Date: 2014-07-16.11:57:57
Good, accepted.
History
Date User Action Args
2014-05-24 10:28:02mdiazcreate
2014-05-24 10:38:26ganeshsetstatus: needs-screening -> in-discussion
assignedto: ganesh
messages: + msg17479
nosy: + ganesh
2014-05-30 21:54:24ganeshsetmessages: + msg17499
2014-06-24 16:04:38mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17565
2014-06-24 19:06:47ghsetmessages: + msg17566
2014-06-25 00:48:37mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17583
2014-07-06 22:03:28ghsetmessages: + msg17592
2014-07-14 21:36:00mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17597
2014-07-15 14:44:43ghsetmessages: + msg17598
2014-07-15 14:47:16mdiazsetfiles: + unnamed
messages: + msg17599
2014-07-15 15:14:31mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17600
2014-07-15 15:42:57mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17601
2014-07-15 19:11:53mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17602
2014-07-16 02:33:31mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17603
2014-07-16 08:32:17ghsetmessages: + msg17604
2014-07-16 08:50:49mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17605
2014-07-16 09:40:36ghsetmessages: + msg17606
2014-07-16 10:32:44mdiazsetfiles: + patch-preview.txt, resolve-issue1624_-bucketed-cache_.dpatch, unnamed
messages: + msg17607
2014-07-16 11:57:57ghsetstatus: in-discussion -> accepted
messages: + msg17608