darcs

Issue 1624 break global cache up into subdirectories

Title break global cache up into subdirectories
Priority feature Status resolved
Milestone 2.10.0 Resolved in 2.10.0
Superseder Nosy List darcs-devel, dmitry.kurochkin, ganesh, kolibrie, kowey, luca, mdiaz, mornfall, simon, simonmar, thorkilnaur, volothamp
Assigned To mdiaz
Topics Hashed, Performance

Created on 2009-09-21.15:35:55 by kowey, last changed 2014-07-16.11:59:02 by noreply.

Messages
msg8855 (view) Author: kowey Date: 2009-09-21.15:35:48
From issue1536.  This we can do without a format change.
msg12208 (view) Author: kowey Date: 2010-08-16.10:47:53
Whoops! I just noticed this bug was wrong assigned to Luca Capello, when 
it should have been assigned to Luca Molteni instead.  My apologies to 
both Lucas
msg17561 (view) Author: gh Date: 2014-06-19.23:27:09
I got curious about cache performance, so I've written bash scripts that
generate 65K bogus cache files per cache subdir (patches/ inventories/
and pristine.hashed/), non-bucketed-ly and bucketed-ly:

http://hub.darcs.net/gh/bench/browse/do_cache_bogus.sh
http://hub.darcs.net/gh/bench/browse/do_cache_bogus_bucketed.sh

I did a test of comparing cloning some repo over http ( time darcs clone
--lazy http://hub.darcs.net/darcs/darcs-wiki ) with an empty cache or
with a big bogus cache, and found no relevant difference. I guess the
speed of accessing and writing a particular file is independent from how
many files are in its directory.

Now the slow operation is to list all files in a big directory.
Currently darcs never needs to list files. But I tested as follow on my
personal laptop (bought 3 years ago but still relevant):

time (tree cache/darcs/pristine.hashed/ |wc -l ) : 17s
time (tree cache/darcs/pristine.hashed/ |wc -l ) : 8s

When doing the same with 5 times as many files (325K files), the
difference grows to 1m35s versus 32s.

(I resetted the cache between each measure with sudo sh -c 'echo 3 >
/proc/sys/vm/drop_caches' , and I'm using ext4 on both machines.)

So, I'm not expecting darcs to run faster with the switch to bucketed
cache right now, nor slower, but I bet global cache garbage collection
will be (when it's implemented). Moreover the global cache will be more
manageable by third-party programs (like ls or rm :-) ).
msg17609 (view) Author: noreply Date: 2014-07-16.11:59:01
The following patch sent by Marcio Diaz <marcio.diaz@gmail.com> updated issue issue1624 with
status=resolved;resolvedin=2.10.0 HEAD

* resolve issue1624: bucketed cache. 
Ignore-this: 2d077f5c10156e4a00631fbc4f8c3119
History
Date User Action Args
2009-09-21 15:35:55koweycreate
2009-09-21 15:37:11koweylinkissue1536 superseder
2009-10-23 22:37:17adminsetnosy: + marlowsd, - simonmar
2009-10-23 23:35:44adminsetnosy: + simonmar, - marlowsd
2009-11-15 17:09:00tux_rockerlinkpatch72 issues
2009-11-15 17:10:39tux_rockersetnosy: + luca
assignedto: luca
2010-03-01 13:21:16koweysettopic: + Target-2.5, - Target-2.4
2010-04-06 11:03:44koweysettopic: + Hashed
2010-06-15 20:52:07adminsetmilestone: 2.5.0
2010-06-15 20:59:41adminsettopic: - Target-2.5
2010-07-25 14:34:55tux_rockersetmilestone: 2.5.0 -> 2.8.0
2010-08-16 10:47:54koweysetassignedto: luca -> volothamp
messages: + msg12208
nosy: + volothamp
2012-05-11 20:01:04ghsetassignedto: volothamp -> (no value)
2013-07-22 12:51:45ghsetmilestone: 2.8.0 -> 2.10.0
2014-03-22 23:59:48ghsetassignedto: mdiaz
nosy: + mdiaz
2014-06-19 23:27:11ghsetnosy: + ganesh, simon
messages: + msg17561
2014-07-16 11:59:02noreplysetstatus: needs-implementation -> resolved
messages: + msg17609
resolvedin: 2.10.0