Review/discussion of the first patch:
> [add fileid info to the index format
> Jose Luis Neder <jlneder@gmail.com>**20130828204421
> Ignore-this: ae0b2040c64700707e4d1fc24979be95
> add some utility functions in AnchoredPath too for managing fileids.
> bump version to 0.5.11
> ] hunk ./Storage/Hashed/AnchoredPath.hs 1
> +{-# LANGUAGE CPP #-}
> -- | This module implements relative paths within a Tree. All paths are
> -- anchored at a certain root (this is usually the Tree root). They are
> -- represented by a list of Names (these are just strict bytestrings).
ok
> hunk ./Storage/Hashed/AnchoredPath.hs 7
> module Storage.Hashed.AnchoredPath
> ( Name(..), AnchoredPath(..), anchoredRoot, appendPath, anchorPath
> - , isPrefix, parent, parents, catPaths, flatten, makeName
> + , isPrefix, parent, parents, catPaths, flatten, makeName,
appendToName
> -- * Unsafe functions.
> hunk ./Storage/Hashed/AnchoredPath.hs 9
> - , floatBS, floatPath ) where
> + , floatBS, floatPath, replacePrefixPath
> + -- IO functions
> + , getFileID, getFileID' ) where
>
> import qualified Data.ByteString.Char8 as BS
> hunk ./Storage/Hashed/AnchoredPath.hs 14
> +import Control.Applicative( (<$>) )
> import Data.List( isPrefixOf, inits )
> import System.FilePath( (</>), splitDirectories, normalise,
dropTrailingPathSeparator )
>
> hunk ./Storage/Hashed/AnchoredPath.hs 18
> +import System.Posix.Types ( FileID )
> +import System.Directory ( doesFileExist, doesDirectoryExist )
> +#ifdef WIN32
> +import System.Win32.File ( createFile, getFileInformationByHandle,
BY_HANDLE_FILE_INFORMATION(..),
> + fILE_SHARE_NONE, fILE_FLAG_BACKUP_SEMANTICS,
> + gENERIC_NONE, oPEN_EXISTING, closeHandle )
> +#else
> +import System.PosixCompat ( fileID, getSymbolicLinkStatus )
> +#endif
> +
> -------------------------------
> -- AnchoredPath utilities
> --
All of the above are imports/exports changes, ok.
> hunk ./Storage/Hashed/AnchoredPath.hs 103
> where make ["."] = AnchoredPath []
> make x = AnchoredPath $ map (Name . BS.pack) x
>
> -
> anchoredRoot :: AnchoredPath
> anchoredRoot = AnchoredPath []
Please remove.
> hunk ./Storage/Hashed/AnchoredPath.hs 106
> +getFileID :: AnchoredPath -> IO (Maybe FileID)
> +getFileID = getFileID' . anchorPath ""
> +
> +getFileID' :: FilePath -> IO (Maybe FileID)
> +getFileID' fp = do file_exists <- doesFileExist fp
> + dir_exists <- doesDirectoryExist fp
> + if file_exists || dir_exists
> +#ifdef WIN32
> + then do handle <- createFile fp gENERIC_NONE
fILE_SHARE_NONE Nothing oPEN_EXISTING fILE_FLAG_BACKUP_SEMANTICS Nothing
> + fhnumber <- (Just . fromIntegral .
bhfiFileIndex) <$> getFileInformationByHandle handle
> + closeHandle handle
> + return fhnumber
> +#else
> + then (Just . fileID) <$> getSymbolicLinkStatus fp
> +#endif
> + else return Nothing
> +
I'm not an expert of the funcions used here (especially the Win32 ones)
but this seems ok.
However, I'd rather have them in the Index module instead of AnchoredPath
(see the description at the beginning of AnchoredPath module).
Or is there a particular reason for this location?
> +-- | Take a prefix path, the changed prefix path, and a path to change.
> +-- Assumes the prefix path is a valid prefix. If prefix is wrong return
> +-- AnchoredPath [].
> +replacePrefixPath :: AnchoredPath -> AnchoredPath -> AnchoredPath ->
AnchoredPath
> +replacePrefixPath (AnchoredPath []) b c = catPaths b c
> +replacePrefixPath (AnchoredPath (r:p)) b (AnchoredPath (r':p')) | r
== r' = replacePrefixPath (AnchoredPath p) b (AnchoredPath p')
> + |
otherwise = AnchoredPath []
> +replacePrefixPath _ _ _ = AnchoredPath []
> +
> +-- | Append a ByteString to the last Name of an AnchoredPath.
> +appendToName :: AnchoredPath -> String -> AnchoredPath
> +appendToName (AnchoredPath p) s = AnchoredPath (init p++[Name finalname])
> + where suffix = BS.pack s
> + finalname | suffix `elem`
(BS.tails lastname) = lastname
> + | otherwise =
BS.append lastname suffix
> + lastname = case last p of
> + Name name -> name
These 2 functions are used in darcs, to update paths of moved files.
I have a vague feeling that they should be in Darcs.UI.Commands.Util,
and added by the patch that implements --look-for-moves. Again, if there's
a particular reason for this location, please tell.
In any case, please save some horizontal space by doing a return when
introducing the guards.
> hunk ./Storage/Hashed/Index.hs 16
> -- copy. This means that every working file and directory has an
entry in the
> -- index, that contains its path and hash and validity data. The
validity data
> -- is a timestamp plus the file size. The file hashes are sha256's of the
> --- file's content.
> +-- file's content. It also contains the fileid for tracking of moved
files.
> --
> -- There are two entry types, a file entry and a directory entry.
Both have a
> -- common binary format (see 'Item'). The on-disk format is best
described by
I think "to track moved files" is better. Or there's a missing "the":
"..for the tracking..".
> hunk ./Storage/Hashed/Index.hs 46
> --
> -- The first word on the index \"line\" is the length of the file
path (which is
> -- the only variable-length part of the line). Then comes the path
itself, then
> --- fixed-length hash (sha256) of the file in question, then two
words, one for
> --- size and one "aux", which is used differently for directories and
for files.
> +-- fixed-length hash (sha256) of the file in question, then three
words, one for
> +-- size, one for "aux", which is used differently for directories and
for files, and
> +-- one for the fileid(inode or fhandle) of the file.
> --
> -- With directories, this aux holds the offset of the next sibling
line in the
> -- index, so we can efficiently skip reading the whole subtree
starting at a
Add a space: "fileid (inode".
> hunk ./Storage/Hashed/Index.hs 59
> -- For files, the aux field holds a timestamp.
>
> module Storage.Hashed.Index( readIndex, updateIndexFrom, indexFormatValid
> - , updateIndex , Index, filter )
> + , updateIndex, listFileIDs, Index, filter )
> where
>
> import Prelude hiding ( lookup, readFile, writeFile, catch, filter )
> hunk ./Storage/Hashed/Index.hs 80
> import System.Directory( removeFile )
> #endif
> import System.FilePath( (</>) )
> +import System.Posix.Types ( FileID )
>
> import Control.Monad( when )
> import Control.Exception.Extensible
> hunk ./Storage/Hashed/Index.hs 94
> , nullForeignPtr, c2w )
>
> import Data.IORef( )
> -import Data.Maybe( fromJust, isJust )
> +import Data.Maybe( fromJust, isJust, fromMaybe )
> import Data.Bits( Bits )
>
> import Foreign.Storable
ok
> hunk ./Storage/Hashed/Index.hs 102
> import Foreign.Ptr
>
> import Storage.Hashed.Hash( sha256, rawHash )
> +import Storage.Hashed.Tree()
>
> --------------------------
> -- Indexed trees
What is this for? There's already an import of Storage.Hashed.Tree above.
> hunk ./Storage/Hashed/Index.hs 124
> size_magic :: Int
> size_magic = 4 -- the magic word, first 4 bytes of the index
>
> -size_dsclen, size_hash, size_size, size_aux :: Int
> +size_dsclen, size_hash, size_size, size_aux, size_fileid :: Int
> size_size = 8 -- file/directory size (Int64)
> size_aux = 8 -- aux (Int64)
ok
> hunk ./Storage/Hashed/Index.hs 127
> +size_fileid = 8 -- fileid (inode or fhandle FileID)
> size_dsclen = 4 -- this many bytes store the length of the path
> size_hash = 32 -- hash representation
>
OK, 8 bytes seems big enough for storing inodes in any practical situation
(2^(8*8) > 10^19 possible different inodes).
> hunk ./Storage/Hashed/Index.hs 131
> -off_size, off_aux, off_hash, off_dsc, off_dsclen :: Int
> +off_size, off_aux, off_hash, off_dsc, off_dsclen, off_fileid :: Int
> off_size = 0
> off_aux = off_size + size_size
> hunk ./Storage/Hashed/Index.hs 134
> -off_dsclen = off_aux + size_aux
> +off_fileid = off_aux + size_aux
> +off_dsclen = off_fileid + size_fileid
> off_hash = off_dsclen + size_dsclen
> off_dsc = off_hash + size_hash
>
ok
> hunk ./Storage/Hashed/Index.hs 141
> itemAllocSize :: AnchoredPath -> Int
> itemAllocSize apath =
> - align 4 $ size_hash + size_size + size_aux + size_dsclen + 2 +
BS.length (flatten apath)
> + align 4 $ size_hash + size_size + size_aux + size_fileid +
size_dsclen + 2 + BS.length (flatten apath)
>
> itemSize, itemNext :: Item -> Int
> hunk ./Storage/Hashed/Index.hs 144
> -itemSize i = size_size + size_aux + size_dsclen + (BS.length $
iHashAndDescriptor i)
> +itemSize i = size_size + size_aux + size_fileid + size_dsclen +
(BS.length $ iHashAndDescriptor i)
> itemNext i = align 4 (itemSize i + 1)
>
> iPath, iHash, iDescriptor :: Item -> BS.ByteString
> hunk ./Storage/Hashed/Index.hs 156
> iSize i = plusPtr (iBase i) off_size
> iAux i = plusPtr (iBase i) off_aux
>
> +iFileID :: Item -> Ptr FileID
> +iFileID i = plusPtr (iBase i) off_fileid
> +
> itemIsDir :: Item -> Bool
> itemIsDir i = unsafeHead (iDescriptor i) == c2w 'D'
>
ok
> hunk ./Storage/Hashed/Index.hs 182
> (dsc_fp, dsc_start, dsc_len) = toForeignPtr dsc
> withForeignPtr fp $ \p ->
> withForeignPtr dsc_fp $ \dsc_p ->
> - do pokeByteOff p (off + off_dsclen) (xlate32 $
fromIntegral dsc_len :: Int32)
> + do fileid <- fromMaybe 0 <$> getFileID apath
> + pokeByteOff p (off + off_fileid) (xlate64 $
fromIntegral fileid :: Int64)
> + pokeByteOff p (off + off_dsclen) (xlate32 $
fromIntegral dsc_len :: Int32)
> memcpy (plusPtr p $ off + off_dsc)
> (plusPtr dsc_p dsc_start)
> (fromIntegral dsc_len)
ok
> hunk ./Storage/Hashed/Index.hs 213
> do xlatePoke64 (iSize item) size
> unsafePokeBS (iHash item) (rawHash hash)
>
> +updateFileID :: Item -> FileID -> IO ()
> +updateFileID item fileid = xlatePoke64 (iFileID item) $ fromIntegral
fileid
> updateAux :: Item -> Int64 -> IO ()
> updateAux item aux = xlatePoke64 (iAux item) $ aux
> updateTime :: forall a.(Enum a) => Item -> a -> IO ()
ok
> hunk ./Storage/Hashed/Index.hs 258
> , treeitem :: !(Maybe (TreeItem IO))
> , resitem :: !Item }
>
> +data ResultF = ResultF { nextF :: !Int
> + , resitemF :: !Item
> + , _fileIDs :: [((AnchoredPath, ItemType),
FileID)] }
> +
> readItem :: Index -> State -> IO Result
> readItem index state = do
> item <- peekItem (mmap index) (start state)
Maybe a haddock to explain what are ResultF and Result and their
respective fields?
> hunk ./Storage/Hashed/Index.hs 274
> readDir index state item =
> do following <- fromIntegral <$> xlatePeek64 (iAux item)
> exists <- fileExists <$> getFileStatusBS (iPath item)
> + fileid <- fromIntegral <$> (xlatePeek64 $ iFileID item)
> + fileid' <- fromMaybe fileid <$> (getFileID' $ BSC.unpack $
iPath item)
> + when (fileid == 0) $ updateFileID item fileid'
> let name it dirlen = Name $ (BS.drop (dirlen + 1) $
iDescriptor it) -- FIXME MAGIC
> namelength = (BS.length $ iDescriptor item) - (dirlength
state)
> myname = name item (dirlength state)
ok
> hunk ./Storage/Hashed/Index.hs 319
> do st <- getFileStatusBS (iPath item)
> mtime <- fromIntegral <$> (xlatePeek64 $ iAux item)
> size <- xlatePeek64 $ iSize item
> + exists <- doesFileExist $ BSC.unpack $ iPath item
> + fileid <- fromIntegral <$> (xlatePeek64 $ iFileID item)
> + fileid' <- fromMaybe fileid <$> (getFileID' $ BSC.unpack $
iPath item)
> let mtime' = modificationTime st
> size' = fromIntegral $ fileSize st
> readblob = readSegment (basedir index </> BSC.unpack
(iPath item), Nothing)
> hunk ./Storage/Hashed/Index.hs 325
> - exists = fileExists st
> we_changed = mtime /= mtime' || size /= size'
> hash = iHash' item
> when we_changed $
Why the change from fileExists to doesFileExist?
> hunk ./Storage/Hashed/Index.hs 331
> do hash' <- sha256 `fmap` readblob
> updateItem item size' hash'
> updateTime item mtime'
> + when (fileid == 0) $ updateFileID item fileid'
> return $ Result { changed = not exists || we_changed
> , next = start state + itemNext item
> , treeitem = if exists then Just $ File $ Blob
readblob hash else Nothing
ok
> hunk ./Storage/Hashed/Index.hs 348
> Just (SubTree tree) -> return $ filter (predicate index) tree
> _ -> fail "Unexpected failure in updateIndex!"
>
> +listFileIDs :: Index -> IO ([((AnchoredPath, ItemType), FileID)])
> +listFileIDs EmptyIndex = return []
> +listFileIDs index =
> + do let initial = State { start = size_magic
> + , dirlength = 0
> + , path = AnchoredPath [] }
> + res <- readItemFileIDs index initial
> + return $ _fileIDs res
> +
Please haddock it, this is the new exported function of the library.
> +readItemFileIDs :: Index -> State -> IO ResultF
> +readItemFileIDs index state = do
> + item <- peekItem (mmap index) (start state)
> + res' <- if itemIsDir item
> + then readDirFileIDs index state item
> + else readFileFileID index state item
> + return res'
> +
> +readDirFileIDs :: Index -> State -> Item -> IO ResultF
> +readDirFileIDs index state item =
> + do fileid <- fromIntegral <$> (xlatePeek64 $ iFileID item)
> + following <- fromIntegral <$> xlatePeek64 (iAux item)
> + let name it dirlen = Name $ (BS.drop (dirlen + 1) $
iDescriptor it) -- FIXME MAGIC
> + namelength = (BS.length $ iDescriptor item) - (dirlength
state)
> + myname = name item (dirlength state)
> + substate = state { start = start state + itemNext item
> + , path = path state `appendPath` myname
> + , dirlength = if myname == Name
(BSC.singleton '.')
> + then dirlength state
> + else dirlength state +
namelength }
> + subs off | off < following = do
> + result <- readItemFileIDs index $ substate { start = off }
> + rest <- subs $ nextF result
> + return $! (name (resitemF result) $ dirlength substate,
result) : rest
> + subs coff | coff == following = return []
> + | otherwise = fail $ "Offset mismatch at " ++
show coff ++
> + " (ends at " ++ show
following ++ ")"
> + inferiors <- subs $ start substate
> + return $ ResultF { nextF = following
> + , resitemF = item
> + , _fileIDs = (((path substate, TreeType),
fileid):concatMap (_fileIDs . snd) inferiors) }
> +
> +readFileFileID :: Index -> State -> Item -> IO ResultF
> +readFileFileID _ state item =
> + do fileid' <- fromIntegral <$> (xlatePeek64 $ iFileID item)
> + let name it dirlen = Name $ (BS.drop (dirlen + 1) $
iDescriptor it)
> + myname = name item (dirlength state)
> + return $ ResultF { nextF = start state + itemNext item
> + , resitemF = item
> + , _fileIDs = [((path state `appendPath`
myname, BlobType), fileid')] }
> +
> +
> -- | Read an index and build up a 'Tree' object from it, referring to
current
> -- working directory. The initial Index object returned by readIndex
is not
> -- directly useful. However, you can use 'Tree.filter' on it. Either
way, to
To be sure that I get it:
the 3 functions above just read the index and return the fileid
information, they do *not* modify the index nor access the disk to read
fileid's,
right?
> hunk ./Storage/Hashed/Index.hs 423
> formatIndex :: ForeignPtr () -> Tree IO -> Tree IO -> IO ()
> formatIndex mmap_ptr old reference =
> do create (SubTree reference) (AnchoredPath []) size_magic
> - unsafePokeBS magic (BSC.pack "HSI4")
> + unsafePokeBS magic (BSC.pack "HSI5")
> where magic = fromForeignPtr (castForeignPtr mmap_ptr) 0 4
> create (File _) path' off =
> do i <- createItem BlobType path' mmap_ptr off
Change index version prefix, ok.
> hunk ./Storage/Hashed/Index.hs 481
> indexFormatValid path' =
> do magic <- mmapFileByteString path' (Just (0, size_magic))
> return $ case BSC.unpack magic of
> - "HSI4" -> True
> + "HSI5" -> True
> _ -> False
> `catch` \(_::SomeException) -> return False
>
Change index version prefix, ok.
> hunk ./Storage/Hashed/Tree.hs 55
> | SubTree !(Tree m)
> | Stub !(m (Tree m)) !Hash
>
> -data ItemType = BlobType | TreeType deriving (Show, Eq)
> +data ItemType = TreeType |BlobType deriving (Show, Eq, Ord)
>
> -- | Abstraction of a filesystem tree.
> -- Please note that the Tree returned by the respective read
operations will
I wondered why Ord was necessary, so I removed it and compiled
hashed-storage
and darcs again, and saw that it was required by the use of a `sortBy` in
`Darcs.UI.Commands.Util.resolveMoves`.
I guess the point of being able to sorting ItemType's is that directories
appear before files when one does `record --look-for-moves`?
Anyway, comment it (and add that missing space).
> hunk ./Storage/Hashed/Tree.hs 117
> --
>
> -- | Look up a 'Tree' item (an immediate subtree or blob).
> +
> lookup :: Tree m -> Name -> Maybe (TreeItem m)
> lookup t n = M.lookup n (items t)
>
Please remove.
> hunk ./hashed-storage.cabal 2
> name: hashed-storage
> -version: 0.5.9
> +version: 0.5.11
> synopsis: Hashed file storage support code.
>
> description: Support code for reading and manipulating hashed file
storage
ok.
> hunk ./hashed-storage.cabal 76
> dataenc,
> binary,
> zlib,
> - mmap >= 0.5 && < 0.6
> + mmap >= 0.5 && < 0.6,
> + unix-compat >= 0.1.2
> +
> + if os(windows)
> + cpp-options: -DWIN32
> + build-depends: Win32 >= 2.0
>
> c-sources: Bundled/sha2.c
>
This is the same as in darcs.cabal, ok.
> hunk ./hashed-storage.cabal 101
> Storage.Hashed.Test
> c-sources: Bundled/sha2.c
>
> + if os(windows)
> + cpp-options: -DWIN32
> + build-depends: Win32 >= 2.0
> +
> if flag(test)
> build-depends: test-framework,
> test-framework-hunit,
ok.
> hunk ./hashed-storage.cabal 108
> - test-framework-quickcheck2,
> + test-framework-quickcheck2 >= 0.3,
> QuickCheck >= 2.3, HUnit, process >= 1.0.1,
zip-archive
> else
> buildable: False
I guess this is for GHC 7.6 support?
|