darcs

Patch 343 First stab at a hashed-storage 0.6 port. (and 14 more)

Title First stab at a hashed-storage 0.6 port. (and 14 more)
Superseder adventure branch
View: 374
Nosy List benmos, kirby, kowey, mornfall
Related Issues
Status obsoleted Assigned To
Milestone

Created on 2010-08-11.23:43:13 by mornfall, last changed 2011-05-10.20:06:08 by darcswatch. Tracked on DarcsWatch.

Files
File name Status Uploaded Type Edit Remove
first-stab-at-a-hashed_storage-0_6-port_.dpatch mornfall, 2010-08-11.23:43:13 text/x-darcs-patch
unnamed mornfall, 2010-08-11.23:43:13
See mailing list archives for discussion on individual patches.
Messages
msg12127 (view) Author: mornfall Date: 2010-08-11.23:43:13
Hi,

this is a work-in-progress bundle. The major part of it is me ripping through
path handling code in darcs (there's lots). I have written a new Path module
for hashed-storage and this bundle mostly replaces all darcs (relative) path
handling by using that new module. The Path module is a work in progress as
well, especially implementation-wise. Nevertheless, I am sending this as a
heads-up because this could be somewhat disrputive.

I am also attaching the current version of the Path module from hashed-storage
(it's not on HEAD yet). Don't bother looking for implementation bugs, I will
need to write a testsuite for this anyway (and the Absolute paths didn't get
any testing. Just from looking at distance I can see a bug...). However, API
suggestions would be welcome. I have a vague plan of splitting the path code
from hashed-storage, bundling it with IO functions that take (Path p) => p type
paths for convenient usage.

Also, support for non-POSIX systems is nearly nonexistent (but to be honest,
the support in current darcs is not very coherent either). I will work on that
as well.

It also seems that removing a pack-unpack and other jiggling code, patch
application in hashed-storage monads has received a significant performance
boost, on the order of 10-15 %. Unfortunately, this does not seem to affect
pull as significantly (as check/repair), as it seems to be spending its
lifetime mostly elsewhere.

The current version seems to be passing the testsuite, too. There's still a lot
of TODO on the darcs side as well.

Sorry, I am becoming incoherent and need to go to bed. Hopefully we can have
sensible discussion later. I did want to get this off my disk though and gather
some eyeballs maybe.

Yours,
   Petr.

-----------------------------------------------------------------

module Storage.Hashed.Path
    ( Path, Relative, Absolute, (</>), Split(..), isPrefix, root, isRoot
    , absolute, relative, directory, singleton, parent, parents, file, Name, (+/+), suffix
    , pathToBS, pathToString, unsafePathFromBS, unsafePathFromString
    , parsePath, parsePathBS ) where

import qualified Data.ByteString.Char8 as BS
import Data.List( isPrefixOf, inits )

-------------------------------
-- Path utilities
--

type Name = BS.ByteString

class Path p where
  root :: p

  unpath :: p -> BS.ByteString
  path :: BS.ByteString -> p

  parsePathBS :: BS.ByteString -> p
  pathToBS :: p -> BS.ByteString

newtype Relative = Relative BS.ByteString deriving (Show, Eq, Ord)
newtype Absolute = Absolute BS.ByteString deriving (Show, Eq, Ord)

isRoot :: (Path p) => p -> Bool
isRoot = BS.null . unpath

instance Path Relative where
  root = Relative BS.empty
  unpath (Relative x) = x
  path = Relative

  pathToBS (unpath -> p) | BS.null p = "."
                         | otherwise = p

  parsePathBS path
     | BS.null path || path == "." = root
     | BS.head path == '/' = error $ "BUG: Not a relative path in Relative.parsePathBS " ++ shown
     | BS.take 2 path == "./" = parsePathBS $ BS.drop 2 path
     | otherwise = case BS.span (/= '/') path of
                        (head, tail) -> root </> head +/+ parsePathBS (BS.drop 1 tail)
    where shown = BS.unpack path


instance Path Absolute where
  root = Absolute $ BS.singleton '/'
  unpath (Absolute x) = BS.drop 1 x
  path = Absolute . BS.cons '/'

  pathToBS (unpath -> p) | BS.null p = "/"
                         | otherwise = p

relative :: Relative
relative = root

absolute :: Absolute
absolute = root

-- | Check whether a path is a prefix of another path.
isPrefix :: Path p => p -> p -> Bool
isPrefix (unpath -> a) (unpath -> b)
  | BS.null a = True
  | BS.length a == BS.length b = a == b
  | BS.isPrefixOf a b = BS.index b (BS.length a) == '/'
  | otherwise = False

(</>) :: forall p. (Show p, Path p) => p -> Name -> p
(unpath -> p) </> n
  | n == BS.pack "." = path p
  | n == BS.pack ".." = parent (path p :: p)
  | BS.null p = path $ BS.concat [unpath (root :: p), n]
  | BS.null n = path p
  | BS.elem '/' n = error $ "BUG: Path components may not contain slashes: "
                    ++ show (BS.unpack p) ++ ", " ++ BS.unpack n
  | otherwise = path $ BS.concat [p, BS.singleton '/', n]

data Split a b = a :/: b | Atomic

singleton :: Path p => p -> Maybe BS.ByteString
singleton (file -> dir :/: file)
  | isRoot dir = Just file
singleton _ = Nothing

directory :: Path p => p -> Split BS.ByteString Relative
directory (unpath -> p) = case BS.break (=='/') p of
  _ | BS.null p -> Atomic
  (dir, p') -> dir :/: path (BS.drop 1 p')

file :: Path p => p -> Split p BS.ByteString
file (unpath -> p) = case BS.breakEnd (=='/') p of
  _ | BS.null p -> Atomic
  (p', file) | BS.null p' -> root :/: file
             | otherwise -> (path $ BS.init p') :/: file

suffix :: (Show p, Path p) => p -> p -> Relative
suffix x@(unpath -> x') y@(unpath -> y')
  | BS.null x' = path y'
  | x `isPrefix` y = path $ BS.drop (BS.length x' + 1) y'
  | otherwise = error $ "BUG: Path " ++ show x ++ " is not a prefix of " ++ show y ++ "!"

-- unsafe
toNames (unpath -> p) = BS.split '/' p
fromNames names = path $ BS.intercalate (BS.singleton '/') names

(+/+) :: Path p => p -> Relative -> p
(unpath -> p) +/+ (unpath -> q)
  | BS.null q = path p
  | BS.null p = path q
  | otherwise = path $ BS.concat [ p, BS.singleton '/', q ]

parent :: (Show p, Path p) => p -> p
parent (file -> parent :/: _) = parent
parent path@(file -> Atomic) = error $ "BUG: " ++ show path ++ " has no parent."

parents :: forall p. Path p => p -> [p]
parents (file -> dir :/: _) = dir : parents dir
parents (file -> Atomic) = []

pathToString :: (Path p) => p -> String
pathToString = BS.unpack . pathToBS

unsafePathFromBS :: (Path p) => BS.ByteString -> p
unsafePathFromBS = path

unsafePathFromString :: (Path p) => String -> p
unsafePathFromString = path . BS.pack

parsePath :: Path a => String -> a
parsePath = parsePathBS . BS.pack

-----------------------------------------------------------------

15 patches for repository http://darcs.net:

Wed Aug 11 17:39:29 CEST 2010  Petr Rockai <me@mornfall.net>
  * First stab at a hashed-storage 0.6 port.

Wed Aug 11 21:25:55 CEST 2010  Petr Rockai <me@mornfall.net>
  * Move the preferences system into IO where it belongs.

Wed Aug 11 21:45:04 CEST 2010  Petr Rockai <me@mornfall.net>
  * Make FileName an alias to Relative (from Hashed.Storage.Path).

Wed Aug 11 22:12:49 CEST 2010  Petr Rockai <me@mornfall.net>
  * Fix annotate that got broken due to path format change.

Thu Aug 12 00:02:43 CEST 2010  Petr Rockai <me@mornfall.net>
  * Replace FilePath with FileName in SelectChanges and ChooseTouching.

Thu Aug 12 00:09:46 CEST 2010  Petr Rockai <me@mornfall.net>
  * Make SubPath just another alias for Relative.

Thu Aug 12 00:16:21 CEST 2010  Petr Rockai <me@mornfall.net>
  * Introduce a new Darcs.Path module to centralise path handling.

Thu Aug 12 00:36:15 CEST 2010  Petr Rockai <me@mornfall.net>
  * Merge Darcs.Patch.FileName into Darcs.Path.

Thu Aug 12 00:47:39 CEST 2010  Petr Rockai <me@mornfall.net>
  * Remove the now-redundant sp2fn.

Thu Aug 12 01:05:34 CEST 2010  Petr Rockai <me@mornfall.net>
  * Fix announceFiles in WhatsNew (abolish unsafePathFrom*).

Thu Aug 12 01:06:04 CEST 2010  Petr Rockai <me@mornfall.net>
  * Restore the ".." check in isMaliciousPath.

Thu Aug 12 01:29:40 CEST 2010  Petr Rockai <me@mornfall.net>
  * Fix a subtle bug in onlyHunks with rather curious side-effects.

Thu Aug 12 01:30:27 CEST 2010  Petr Rockai <me@mornfall.net>
  * Merge Darcs.RepoPath into Darcs.Path.

Thu Aug 12 01:31:25 CEST 2010  Petr Rockai <me@mornfall.net>
  * Drop unused and redundant pathFromFileName.

Thu Aug 12 01:32:08 CEST 2010  Petr Rockai <me@mornfall.net>
  * Make the tests pass (mostly ./foo versus foo in user-visible output).
Attachments
msg12140 (view) Author: kowey Date: 2010-08-12.12:47:40
On Wed, Aug 11, 2010 at 23:43:13 +0000, Petr Ročkai wrote:
> I have a vague plan of splitting the path code
> from hashed-storage, bundling it with IO functions that take (Path p) => p type
> paths for convenient usage.

Minor admin note (sorry!) Perhaps Ben Moseley (who was at the Utrecht
hacking sprint in 2008) would be interested in you becoming the
maintainer for the pathtype package?
http://hackage.haskell.org/package/pathtype

By the way, Duncan (bcc'ed) has expressed an interest in pathtype
before, so if he has anything to say about
http://bugs.darcs.net/patch343 perhaps he'll chime in.

> It also seems that removing a pack-unpack and other jiggling code, patch
> application in hashed-storage monads has received a significant performance
> boost, on the order of 10-15 %. Unfortunately, this does not seem to affect
> pull as significantly (as check/repair), as it seems to be spending its
> lifetime mostly elsewhere.

Still seems nice.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
msg12151 (view) Author: kowey Date: 2010-08-13.08:54:07
This is mostly an admin-level reply...

On Wed, Aug 11, 2010 at 23:43:13 +0000, Petr Ročkai wrote:
> I have written a new Path module for hashed-storage and this bundle
> mostly replaces all darcs (relative) path handling by using that new
> module.

Nice! You may find it useful to search the BTS for issues tagged
FilePath and status not resolved <http://is.gd/efHaI>.

Could you please summarise what's wrong with the current path handling?
(Crappy design or lack of design could be a good answer, but it's nice
to be a bit clear about this).

Are you specifically targeting performance issues (pack/unpack) or code
safety issues (list of path components better than strings to avoid
silly trailing slash stuff, typed paths), and/or something else?  I
think this discussion has taken place over some days on IRC and I
confess I have not really been following along. Sorry!

> Nevertheless, I am sending this as a heads-up
> because this could be somewhat disrputive.

Thanks! Somewhere in the past I think we'd agreed that major pieces of
work should be discussed first to reduce pressure.  But it's easy to
forget to do and I'm glad somebody's still remembering.  (I can't find
an archive for this, unfortunately; perhaps it should go on
http://wiki.darcs.net/Development/GettingStarted)

By the way, you asked about the feasibility of this work entering Darcs
2.8 (what comes after 2.5).  I replied that if we agree in principle
that the work is important and (I'll add now) that the work is "done
enough" to start merging it in, then maybe getting it in early in the
life cycle is best so that we have time to wring out the bugs.

> However, API suggestions would be welcome. I have a vague plan of
> splitting the path code from hashed-storage, bundling it with IO
> functions that take (Path p) => p type paths for convenient usage.

When I mentioned the pathtype package, I think you said it had a huge
API and that you'd rather focus on a limited set of operations that we
can make sure we get right
<http://irclog.perlgeek.de/darcs/2010-08-12#i_2694933>

I'm CC'ing Salvatore because I remember that in Zurich (over lunch), he
was telling me that we really really need to work on the filepath stuff.
Maybe he'll have comments to make.

> Also, support for non-POSIX systems is nearly nonexistent (but to be honest,
> the support in current darcs is not very coherent either). I will work on that
> as well.

One minor detail to remember is funny Windows paths for network shares
(\\something_or_another/blah).  I seem to remember we've never quite
gotten that stuff right.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, please try +44 (0)1273 64 2905.
msg12161 (view) Author: kowey Date: 2010-08-14.11:12:10
Jason posted on IRC: ?tell kowey re:FilePath.  I can't speak for Petr, 
but the current stuff does not have a coherent design.  The 
packing/unpacking is inefficient.  We could use more type safety in our 
filepaths, but perhaps not in the way typepath works.  We need it to 
separate 'clean' paths from 'dirty' paths.  Dirty paths are ones that 
have not been transformed or checked to make sure they are sane.
msg12162 (view) Author: kowey Date: 2010-08-14.11:14:23
On Sat, Aug 14, 2010 at 11:12:11 +0000, Eric Kow wrote:
> We need it to separate 'clean' paths from 'dirty' paths.  Dirty paths
> are ones that have not been transformed or checked to make sure they
> are sane.

Aren't the current Darcs.RepoPath paths implicitly marked clean,
with the dirty ones being String?  The RepoPath work was never
really finished, just sort of inserted piecemeal into Darcs;
hopefully, Petr's attempt is a lot better.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, try +44 (0)1273 64 2905 or
xmpp:kowey@jabber.fr (Jabber or Google Talk only)
msg12211 (view) Author: benmos Date: 2010-08-16.20:06:41
Just been pointed to this as I started doing a bit more work on 
'pathtype' recently. FWIW I'm working on adding basic Windows path 
support (UNC, drive letters etc). As I see it, the major differences 
are: pathtype's API is based on System.FilePath, pathtype uses phantom 
types (as opposed to newtypes and a typeclass), pathtype uses a File/Dir 
distinction. (As an aside, following some comments from Duncan and Wren, 
I'm wondering about expanding its Abs/Rel distinction to capture 
'Partial' paths). Anyway, I'd be interested in any thoughts.
msg12222 (view) Author: mornfall Date: 2010-08-18.18:46:15
@benmos: What would partial paths be? I also need the code to be 
efficient, and probably to store the paths in bytestrings. And I am not 
very fond of the filepath API, since it's quite big and somewhat 
unwieldy (although it could be that it can't be really improved much). 
Anyway, most of the hashed-storage code is about manipulating trees, so 
I need to traverse the tree using a path. The view patterns I introduced 
in here seem to be very convenient for that.
msg12382 (view) Author: kowey Date: 2010-08-31.09:52:57
Obsoleted by patch374 if I understand correctly
History
Date User Action Args
2010-08-11 23:43:13mornfallcreate
2010-08-11 23:44:45darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-1ed13311268a357282cf7ca3523df562a7b40478
2010-08-12 12:47:40koweysetnosy: + kowey
messages: + msg12140
2010-08-13 08:54:08koweysetnosy: + kirby
messages: + msg12151
2010-08-14 11:12:11koweysetmessages: + msg12161
2010-08-14 11:14:23koweysetmessages: + msg12162
2010-08-16 20:06:41benmossetnosy: + benmos
messages: + msg12211
2010-08-18 18:46:16mornfallsetmessages: + msg12222
2010-08-31 09:52:57koweysetstatus: needs-review -> obsoleted
messages: + msg12382
superseder: + adventure branch
2011-05-10 20:06:08darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-1ed13311268a357282cf7ca3523df562a7b40478 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-1ed13311268a357282cf7ca3523df562a7b40478