darcs

Issue 1624 break global cache up into subdirectories

Title break global cache up into subdirectories
Priority feature Status resolved
Milestone 2.10.0 HEAD Resolved in 2.10.0 HEAD
Superseder Nosy List darcs-devel, dmitry.kurochkin, ganesh, kolibrie, kowey, luca, mdiaz, mornfall, simon, simonmar, thorkilnaur, volothamp
Assigned To mdiaz
Topics Hashed, Performance

Created on 2009-09-21.15:35:55 by kowey, last changed 2014-07-16.11:59:02 by noreply.

Messages
msg8855 (view) Author: kowey Date: 2009-09-21.15:35:48
From issue1536.  This we can do without a format change.
msg12208 (view) Author: kowey Date: 2010-08-16.10:47:53
Whoops! I just noticed this bug was wrong assigned to Luca Capello, when 
it should have been assigned to Luca Molteni instead.  My apologies to 
both Lucas
msg17561 (view) Author: gh Date: 2014-06-19.23:27:09
I got curious about cache performance, so I've written bash scripts that
generate 65K bogus cache files per cache subdir (patches/ inventories/
and pristine.hashed/), non-bucketed-ly and bucketed-ly:

http://hub.darcs.net/gh/bench/browse/do_cache_bogus.sh
http://hub.darcs.net/gh/bench/browse/do_cache_bogus_bucketed.sh

I did a test of comparing cloning some repo over http ( time darcs clone
--lazy http://hub.darcs.net/darcs/darcs-wiki ) with an empty cache or
with a big bogus cache, and found no relevant difference. I guess the
speed of accessing and writing a particular file is independent from how
many files are in its directory.

Now the slow operation is to list all files in a big directory.
Currently darcs never needs to list files. But I tested as follow on my
personal laptop (bought 3 years ago but still relevant):

time (tree cache/darcs/pristine.hashed/ |wc -l ) : 17s
time (tree cache/darcs/pristine.hashed/ |wc -l ) : 8s

When doing the same with 5 times as many files (325K files), the
difference grows to 1m35s versus 32s.

(I resetted the cache between each measure with sudo sh -c 'echo 3 >
/proc/sys/vm/drop_caches' , and I'm using ext4 on both machines.)

So, I'm not expecting darcs to run faster with the switch to bucketed
cache right now, nor slower, but I bet global cache garbage collection
will be (when it's implemented). Moreover the global cache will be more
manageable by third-party programs (like ls or rm :-) ).
msg17609 (view) Author: noreply Date: 2014-07-16.11:59:01
The following patch sent by Marcio Diaz <marcio.diaz@gmail.com> updated issue issue1624 with
status=resolved;resolvedin=2.10.0 HEAD

* resolve issue1624: bucketed cache. 
Ignore-this: 2d077f5c10156e4a00631fbc4f8c3119
History
Date User Action Args
2009-09-21 15:35:55koweycreate
2009-09-21 15:37:11koweylinkissue1536 superseder
2009-10-23 22:37:17adminsetnosy: + marlowsd, - simonmar
2009-10-23 23:35:44adminsetnosy: + simonmar, - marlowsd
2009-11-15 17:09:00tux_rockerlinkpatch72 issues
2009-11-15 17:10:39tux_rockersetnosy: + luca
assignedto: luca
2010-03-01 13:21:16koweysettopic: + Target-2.5, - Target-2.4
2010-04-06 11:03:44koweysettopic: + Hashed
2010-06-15 20:52:07adminsetmilestone: 2.5.0
2010-06-15 20:59:41adminsettopic: - Target-2.5
2010-07-25 14:34:55tux_rockersetmilestone: 2.5.0 -> 2.8.0
2010-08-16 10:47:54koweysetassignedto: luca -> volothamp
messages: + msg12208
nosy: + volothamp
2012-05-11 20:01:04ghsetassignedto: volothamp -> (no value)
2013-07-22 12:51:45ghsetmilestone: 2.8.0 -> 2.10.0 HEAD
2014-03-22 23:59:48ghsetassignedto: mdiaz
nosy: + mdiaz
2014-06-19 23:27:11ghsetnosy: + ganesh, simon
messages: + msg17561
2014-07-16 11:59:02noreplysetstatus: needs-implementation -> resolved
messages: + msg17609
resolvedin: 2.10.0 HEAD