I got curious about cache performance, so I've written bash scripts that
generate 65K bogus cache files per cache subdir (patches/ inventories/
and pristine.hashed/), non-bucketed-ly and bucketed-ly:
http://hub.darcs.net/gh/bench/browse/do_cache_bogus.sh
http://hub.darcs.net/gh/bench/browse/do_cache_bogus_bucketed.sh
I did a test of comparing cloning some repo over http ( time darcs clone
--lazy http://hub.darcs.net/darcs/darcs-wiki ) with an empty cache or
with a big bogus cache, and found no relevant difference. I guess the
speed of accessing and writing a particular file is independent from how
many files are in its directory.
Now the slow operation is to list all files in a big directory.
Currently darcs never needs to list files. But I tested as follow on my
personal laptop (bought 3 years ago but still relevant):
time (tree cache/darcs/pristine.hashed/ |wc -l ) : 17s
time (tree cache/darcs/pristine.hashed/ |wc -l ) : 8s
When doing the same with 5 times as many files (325K files), the
difference grows to 1m35s versus 32s.
(I resetted the cache between each measure with sudo sh -c 'echo 3 >
/proc/sys/vm/drop_caches' , and I'm using ext4 on both machines.)
So, I'm not expecting darcs to run faster with the switch to bucketed
cache right now, nor slower, but I bet global cache garbage collection
will be (when it's implemented). Moreover the global cache will be more
manageable by third-party programs (like ls or rm :-) ).
|