Created on 2009-08-18.08:36:25 by kowey, last changed 2024-08-11.10:13:39 by bfrk.
msg8228 (view) |
Author: kowey |
Date: 2009-08-18.08:36:19 |
|
Darcs caches can get very large, which can make looking up files in them slow.
This is really a hashed-storage feature request, so we should probably link to
its ticket, but I'm creating this task because it's particularly relevant to
darcs and I think we should be keeping track of it.
|
msg8597 (view) |
Author: simonmar |
Date: 2009-08-30.13:26:50 |
|
I have some figures for how much this may be affecting performance on Linux with
a recent ext3 (kernel 2.6.28).
For a GHC repository with 21000 files in _darcs/patches, I used a program that
opens and closes every file in the directory.
- cold cache: 14.70s real 0.15s user 0.62s system
- warm cache: 0.26s real 0.09s user 0.14s system
(to flush the cache before running the test, I used "echo 3
>/proc/sys/vm/drop_caches")
After making 16 subdirectories 0/ 1/ ... e/ f/ and splitting the patches amongst
the subdirectories:
- cold cache: 4.70s real 0.09s user 0.74s system
- warm cache: 0.24s real 0.11s user 0.12s system
Conclusion: with a warm cache, there's no difference - presumably Linux's name
lookup cache is big enough to hold all 21k lookups. Without anything cached,
the subdirectory version is 3x faster (but the difference is all in real time,
not system time, which implies that this is due to reading less data from disk
rather than poor algorithms in the kernel's lookup code).
Program I used to measure this:
import System.IO
import Control.Monad
import System.Posix
import System.Environment
import qualified Data.ByteString.Char8 as B
main = do
[file] <- getArgs
ls <- B.split '\n' `fmap` B.readFile file
forM_ ls $ \file -> do
let str = B.unpack file
when (not (null str)) $ do
fd <- openFd str ReadOnly Nothing defaultFileFlags
closeFd fd
|
msg8856 (view) |
Author: kowey |
Date: 2009-09-21.15:37:02 |
|
I'm splitting off issue1624 into a separate ticket. This requires a format
change, so it may be best to lump it in together with other things like packs.
|
msg11494 (view) |
Author: tux_rocker |
Date: 2010-06-20.13:47:06 |
|
I'm going to bump this to 2.6 because no-one is working on it right now
and it requires a format change.
|
msg24073 (view) |
Author: bfrk |
Date: 2024-08-10.17:39:52 |
|
I am re-opening this issue because it turns out that this is not just a
matter of performance. On simple file systems like FAT32 it leads to
failures with large enough repositories (like that of darcs itself). The
error message (on Linux) is, unfortunately, a misleading/unhelpful "no
space left on device" (ENOSPC). Patch file names are 76 bytes long, so
according to https://superuser.com/questions/446282/max-files-per-
directory-on-ntfs-vol-vs-fat32/1544848#1544848, on FAT32 the maximum is
(2^16 - 3) * ceil (76 / 13) ~= 11000 patch files.
The practical relevance is that FAT32 is often used for temporary storage
e.g. on a USB memory stick.
This should be fixed in darcs-3 by using bucketed hashed dirs for
repositories, too.
|
msg24074 (view) |
Author: bfrk |
Date: 2024-08-11.10:13:39 |
|
When we fix this we should also drop the (unused) size prefixes and
change the file names so as not to repeat the two hex digits in the
bucket. Like in
(bucketdir,filename) = splitAt 2 (to_file_path hash)
|
|
Date |
User |
Action |
Args |
2009-08-18 08:36:25 | kowey | create | |
2009-08-25 18:15:31 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-27 14:25:58 | admin | set | nosy:
kowey, darcs-devel, thorkilnaur, kolibrie, dmitry.kurochkin, mornfall |
2009-08-30 13:26:53 | simonmar | set | nosy:
+ simonmar messages:
+ msg8597 |
2009-09-14 10:57:37 | kowey | set | topic:
+ Target-2.4 nosy:
kowey, darcs-devel, simonmar, thorkilnaur, kolibrie, dmitry.kurochkin, mornfall |
2009-09-21 15:37:11 | kowey | set | status: needs-implementation -> waiting-for title: break caches up into subdirectories -> break repo caches up into subdirectories nosy:
kowey, darcs-devel, simonmar, thorkilnaur, kolibrie, dmitry.kurochkin, mornfall messages:
+ msg8856 topic:
+ Target-2.5, - Target-2.4 superseder:
+ break global cache up into subdirectories |
2009-10-23 22:36:51 | admin | set | nosy:
+ marlowsd, - simonmar |
2009-10-23 23:35:21 | admin | set | nosy:
+ simonmar, - marlowsd |
2010-03-25 14:21:39 | kowey | set | topic:
+ Hashed |
2010-06-15 20:52:01 | admin | set | milestone: 2.5.0 |
2010-06-15 20:59:28 | admin | set | topic:
- Target-2.5 |
2010-06-20 13:47:07 | tux_rocker | set | nosy:
+ tux_rocker messages:
+ msg11494 milestone: 2.5.0 -> 2.8.0 |
2014-06-19 23:24:24 | gh | set | nosy:
+ ganesh |
2017-07-31 01:28:43 | gh | set | status: waiting-for -> given-up |
2024-08-10 17:39:53 | bfrk | set | status: given-up -> unknown messages:
+ msg24073 milestone: 2.8.0 -> 3.0.0 |
2024-08-11 10:13:39 | bfrk | set | messages:
+ msg24074 |
|