darcs

Patch 127 Re: patches to Darcs.Commands.ShowFiles, and repository internals

Title Re: patches to Darcs.Commands.ShowFiles, and repository internals
Superseder Nosy List dagit, dbp, duncan, ganesh, kowey, mornfall, sriram.durbha, thomashartman1
Related Issues darcs library is not thread-safe (uses current working directory)
View: 1774
Status obsoleted Assigned To thomashartman1
Milestone

Created on 2010-01-02.18:18:28 by thomashartman1, last changed 2012-01-01.00:33:46 by kowey.

Files
File name Status Uploaded Type Edit Remove
in-show-files_-break-out-manifestcmd_-helper-function-with-result-type-io-_filepath__-as-baby-step-towards-a-useful-darcs-api.dpatch thomashartman1, 2010-01-02.22:19:20 application/octet-stream
See mailing list archives for discussion on individual patches.
Messages
msg9717 (view) Author: thomashartman1 Date: 2010-01-02.18:18:27
[patch] 	would be useful to patch-tag. (baby steps towards darcs API,
[patch] 	and factoring out getCurrentDirectory dependence)

There are a couple issues I'm facing with patch-tag, some of which may
also be shared by gitit, and most of which have to do with darcs api
(or lack thereof).

I would like to be able to do actions like
-- list files in a repository subdirectory, as of a certain patch
-- list repository history, or some subset thereof

without

a) forking a process with a shell call to darcs inside of it, which
must then trap stdout and parse the output. For darcs changes
--xml-output especially, this is quite slow. (threads are cheap,
processes are expensive)
b) calling setCurrentDirectory, because this seems like a likely cause
of problems if two repositories are attempted to be read
simultaneously (of course this is not a problem if using heavyweight
processes, but as said that's way more expensive.) I believe this
could also bite patch-tag, for example, when serving static content.
In general, even without bringing the need for an API in, it seems
like currentDirectory agnosticism is a better way to go.

The attached patches are mostly improvements in this direction, along
with a few that are more just readability or style.

I hope that all of these can be applied, though not getting my hopes
up for darcs 2.4 given the freeze eh?

Best regards,

Thomas.

-- 
Need somewhere to put your code? http://patch-tag.com
Want to build a webapp? http://happstack.com
msg9718 (view) Author: thomashartman1 Date: 2010-01-02.22:19:20
oops, patches are attached.

Note that there are several patches. I forget if this is the wrong way
to do things.

Please let me know if I should do one at a time.

On Sat, Jan 2, 2010 at 12:18 PM, Thomas Hartman
<thomashartman1@googlemail.com> wrote:
> There are a couple issues I'm facing with patch-tag, some of which may
> also be shared by gitit, and most of which have to do with darcs api
> (or lack thereof).
>
> I would like to be able to do actions like
> -- list files in a repository subdirectory, as of a certain patch
> -- list repository history, or some subset thereof
>
> without
>
> a) forking a process with a shell call to darcs inside of it, which
> must then trap stdout and parse the output. For darcs changes
> --xml-output especially, this is quite slow. (threads are cheap,
> processes are expensive)
> b) calling setCurrentDirectory, because this seems like a likely cause
> of problems if two repositories are attempted to be read
> simultaneously (of course this is not a problem if using heavyweight
> processes, but as said that's way more expensive.) I believe this
> could also bite patch-tag, for example, when serving static content.
> In general, even without bringing the need for an API in, it seems
> like currentDirectory agnosticism is a better way to go.
>
> The attached patches are mostly improvements in this direction, along
> with a few that are more just readability or style.
>
> I hope that all of these can be applied, though not getting my hopes
> up for darcs 2.4 given the freeze eh?
>
> Best regards,
>
> Thomas.
>
> --
> Need somewhere to put your code? http://patch-tag.com
> Want to build a webapp? http://happstack.com
>



-- 
Need somewhere to put your code? http://patch-tag.com
Want to build a webapp? http://happstack.com
Attachments
msg9720 (view) Author: ganesh Date: 2010-01-03.21:24:20
In general I think this looks good, some detailed comments/requests for
changes/further explanation below.

As you say this is a baby step and noone should rely on the precise signature of
the resulting API functions, though we should certainly aim to always provide
this kind of functionality in some form.

> [in show files, break out manifestCmd' helper function with result type IO
[FilePath], as baby step towards a useful darcs api
> thomashartman1@gmail.com**20100102140103
>  Ignore-this: 5290a1b50a4a23a961138920de7bc6b4
> ]

This patch has a spurious whitespace change at the end.

Also, I'm not convinced by the naming of "filesCmd'" for an exported function
intended to be in the API. I'd suggest renaming things appropriately so this can
be "filesCmd" or "filesCommand".

> [simplify output subfunction
> thomashartman1@gmail.com**20100102161308
> Ignore-this: b32d46fdfa02ee74425966747a711226
> ]
> -  where output_null name = do { putStr name ; putChar '\0' }
> -        output = if NullFlag `elem` opts then output_null else putStrLn
> +  where output name = do putStr name
> +                         putChar $ if NullFlag `elem` opts then '\0' else '\n'

Is this change correct on Windows? I suspect so but we should check.

> [make monad pipeline explicit
> thomashartman1@gmail.com**20100102174548
>  Ignore-this: 8f10d4403232fc96f55834833736403c
> ]

> -         do x <- do pid <- runProcess c args Nothing Nothing (Just h) Nothing
Nothing
> -                    waitForProcess pid
> +         do x <- do waitForProcess  =<< runProcess c args Nothing Nothing
(Just h) Nothing Nothing
>              when (x == ExitFailure 127) $

In general this kind of change feels rather like churn, though I think in this
particular case it might be a slight improvement. However you have left in an
unnecessary "do".

> [make slurpPristine use absolute paths (no more need to wrap in
getCurrrentDirectory/withCurrentDirectory)
> thomashartman1@gmail.com**20100102175426
>  Ignore-this: 9e0d47590c4b92d9c6c774730136113e
> ]

I'm not convinced this patch is correct, since mmap_slurp itself looks at the
current directory. I definitely support the intention though. Could you provide
more justification of why it's ok?
msg9789 (view) Author: mornfall Date: 2010-01-11.23:43:44
Hi,

Ganesh Sittampalam <bugs@darcs.net> writes:

>> [make slurpPristine use absolute paths (no more need to wrap in
> getCurrrentDirectory/withCurrentDirectory)
>> thomashartman1@gmail.com**20100102175426
>>  Ignore-this: 9e0d47590c4b92d9c6c774730136113e
>> ]
>
> I'm not convinced this patch is correct, since mmap_slurp itself looks at the
> current directory. I definitely support the intention though. Could you provide
> more justification of why it's ok?

One more thing to keep in mind with this kind of patches is that
handling absolute paths (and probably more so with FilePath/String-based
paths) could be potentially much more expensive than handling relative
paths. Path manipulation is a nontrivial portion of the darcs processing
time and doing this kind of change could slow down things across the
board.

In a repository that lives at 40-character deep absolute path, with
10000 files, the path operations done for pristine need to churn over 3
megabytes extra of the common prefixes (10000 * 40 * 8 -- assuming 8
bytes per String element). If the paths are stored (not sure right now),
the prefixes cannot be shared due to nature of Haskell lists.

Moreover, the index stores relative paths, since the repository root can
relocate. If we required absolute paths to be used in stat/open calls,
this would force darcs to copy each path out of the mmap'd index into
the heap, concatenating it with current path, again an operation that is
not completely free (the ability to pass pointers into the mmap area
without further manipulation directly to stat proved to be an important
optimisation).

I would say that if you are running on Linux, the right answer to your
problem is omitting CLONE_FS from the clone(...) flags (which makes the
threads to have their own working directory). I am not sure this is
achievable through the pthread interface, though.

Yours,
   Petr.
msg9803 (view) Author: thomashartman1 Date: 2010-01-12.23:20:33
What and where are the  CLONE_FS (...) flags ?

...darcs.net $ find -type d -name _darcs -prune -o -name *.hs -print |
xargs -i grep CLONE {} # (no output)



2010/1/11 Petr Ročkai <bugs@darcs.net>:
>
> Petr Ročkai <me@mornfall.net> added the comment:
>
> Hi,
>
> Ganesh Sittampalam <bugs@darcs.net> writes:
>
>>> [make slurpPristine use absolute paths (no more need to wrap in
>> getCurrrentDirectory/withCurrentDirectory)
>>> thomashartman1@gmail.com**20100102175426
>>>  Ignore-this: 9e0d47590c4b92d9c6c774730136113e
>>> ]
>>
>> I'm not convinced this patch is correct, since mmap_slurp itself looks at the
>> current directory. I definitely support the intention though. Could you provide
>> more justification of why it's ok?
>
> One more thing to keep in mind with this kind of patches is that
> handling absolute paths (and probably more so with FilePath/String-based
> paths) could be potentially much more expensive than handling relative
> paths. Path manipulation is a nontrivial portion of the darcs processing
> time and doing this kind of change could slow down things across the
> board.
>
> In a repository that lives at 40-character deep absolute path, with
> 10000 files, the path operations done for pristine need to churn over 3
> megabytes extra of the common prefixes (10000 * 40 * 8 -- assuming 8
> bytes per String element). If the paths are stored (not sure right now),
> the prefixes cannot be shared due to nature of Haskell lists.
>
> Moreover, the index stores relative paths, since the repository root can
> relocate. If we required absolute paths to be used in stat/open calls,
> this would force darcs to copy each path out of the mmap'd index into
> the heap, concatenating it with current path, again an operation that is
> not completely free (the ability to pass pointers into the mmap area
> without further manipulation directly to stat proved to be an important
> optimisation).
>
> I would say that if you are running on Linux, the right answer to your
> problem is omitting CLONE_FS from the clone(...) flags (which makes the
> threads to have their own working directory). I am not sure this is
> achievable through the pthread interface, though.
>
> Yours,
>   Petr.
>
> ----------
> nosy: +mornfall
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/patch127>
> __________________________________
> _______________________________________________
> darcs-users mailing list
> darcs-users@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-users
>
msg9807 (view) Author: mornfall Date: 2010-01-13.07:13:28
Hi,

Thomas Hartman <bugs@darcs.net> writes:
> What and where are the  CLONE_FS (...) flags ?
>
> ...darcs.net $ find -type d -name _darcs -prune -o -name *.hs -print |
> xargs -i grep CLONE {} # (no output)
sorry, I was referring to the Linux system call clone, which is used to
create new threads. In Linux, threads can be created in such a way that
each thread has its own working directory. This would likely solve your
problem of darcs calling and relying on setCurrentDirectory, IIUIC?

(But it is problematic in other ways, since this does not seem to be
available from neither pthread API nor GHC threading API...)

Yours,
   Petr.
msg9823 (view) Author: thomashartman1 Date: 2010-01-14.20:21:29
With regards to http://bugs.darcs.net/patch127

> (But it is problematic in other ways, since this does not seem to be
> available from neither pthread API nor GHC threading API...)

I am interested in solving this problem but I admit a couple things.

1) I am not sure it is really a problem
2) This is pretty unfamiliar terrain for me.

Is it really a problem? I suspect it is, but I can't currently prove
it. I would like to introduce darcs api functions that return lists of
files/changes et al, and call these api functions directly from
patch-tag/gitit. If two users simultaneously query some informatin
about a repo, and cwd changes while in the middle of such an api call,
we're in trouble right? If we're not in trouble, of course I should
not be fixing a non-problem.

FWIW I'm not even currently running patch-tag server compiled with
-threaded. Does that mean I'm safe(r) in the above scenario? Happstack
(I assume) uses forkIOs or some equivalent in its implementation for
handling requests.

Suggestions for determining whether this is a problem or not, are welcome.

2) Assuming the answer to 1 is, it's a problem: where to begin? Solve
it in pthreads, solve it in the GHC threading API, or solve it somehow
else? I have not used the pthreads api from haskell (nor anywhere
else), can someone point me to a tutorial or howto? I assume the GHC
threading API is basically Control.Concurrent... is this right?

Guidance welcome here as well.


2010/1/13 Petr Ročkai <bugs@darcs.net>:
>
> Petr Ročkai <me@mornfall.net> added the comment:
>
> Hi,
>
> Thomas Hartman <bugs@darcs.net> writes:
>> What and where are the  CLONE_FS (...) flags ?
>>
>> ...darcs.net $ find -type d -name _darcs -prune -o -name *.hs -print |
>> xargs -i grep CLONE {} # (no output)
> sorry, I was referring to the Linux system call clone, which is used to
> create new threads. In Linux, threads can be created in such a way that
> each thread has its own working directory. This would likely solve your
> problem of darcs calling and relying on setCurrentDirectory, IIUIC?
>
> (But it is problematic in other ways, since this does not seem to be
> available from neither pthread API nor GHC threading API...)
>
> Yours,
>   Petr.
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/patch127>
> __________________________________
> _______________________________________________
> darcs-users mailing list
> darcs-users@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-users
>
msg10252 (view) Author: dagit Date: 2010-03-18.02:46:57
Could we please get a status update?  Looks like this has been sitting 
since about mid-January.

Thanks!
Jason
msg10613 (view) Author: duncan Date: 2010-04-01.20:25:30
Here's a crazy idea: use the POSIX "*at" functions, see man openat for
an explanation and links to the related functions. The API can be
emulated on Windows at the performance cost that has already been noted.

The downside of course would be not using the ordinary System.*
functions and having to write your own binding. Of course such a binding
would be useful more widely, probably best as an implementation
substrate in some new IO lib that has an abstract FilePath type.
msg10623 (view) Author: ganesh Date: 2010-04-02.10:30:23
Let's have further discussion of the issues with absolute/relative 
directories on issue1774.
msg10624 (view) Author: kowey Date: 2010-04-02.10:33:39
On Fri, Apr 02, 2010 at 10:30:24 +0000, Ganesh Sittampalam wrote:
> Let's have further discussion of the issues with absolute/relative 
> directories on issue1774.

Preferably darcs-users :-)

[If I'm sufficiently alert, I try to steer discussions towards the list
so that the tracker can be largely reserved to tracking]

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg10653 (view) Author: ganesh Date: 2010-04-03.13:06:04
Given the complexity of the issue, please drop the current directory 
changes from this patch; they should be dealt with independently and with 
a comprehensive solution across the entire codebase.

Other than that I requested some changes in msg9720.
msg10674 (view) Author: thomashartman1 Date: 2010-04-05.08:11:41
I still would like this functionality, but I agree that the initial
patch was maybe too ambitious. I'll try to give it another go this
week.

2010/4/3 Ganesh Sittampalam <bugs@darcs.net>:
>
> Ganesh Sittampalam <ganesh@earth.li> added the comment:
>
> Given the complexity of the issue, please drop the current directory
> changes from this patch; they should be dealt with independently and with
> a comprehensive solution across the entire codebase.
>
> Other than that I requested some changes in msg9720.
>
> ----------
> assignedto: ganesh -> thomashartman1
> status: review-in-progress -> amend-requested
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/patch127>
> __________________________________
> _______________________________________________
> darcs-users mailing list
> darcs-users@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-users
>
msg14933 (view) Author: kowey Date: 2012-01-01.00:33:46
Applied as-is:

Sat Jan  2 09:45:48 PST 2010  thomashartman1@gmail.com
  * make monad pipeline explicit

Eh, it's a bit churn-y as Ganesh points out, but still helpful here.  
The extra do will just have to be left for a future hlinter.

Reworked and resubmitted as-is in patch685:

Sat Jan  2 06:01:03 PST 2010  thomashartman1@gmail.com
  * in show files, break out manifestCmd' helper function with result 
type IO [FilePath], as baby step towards a useful darcs api

Sat Jan  2 08:11:55 PST 2010  thomashartman1@gmail.com
  * rename filesCmd' to filesAndDirsCmd (eg, full listing)

Sat Jan  2 08:13:08 PST 2010  thomashartman1@gmail.com
  * simplify output subfunction


Sat Jan  2 08:15:32 PST 2010  thomashartman1@gmail.com
  * simplify isParentDir subfunction from four tests to two, by bringing 
in System.FilePath.splitDirectories. Wonder about why prepending '.' is 
necessary in this test.

Sat Jan  2 08:32:43 PST 2010  thomashartman1@gmail.com
  * improve readability of onlysubdirs subcommand, by using any instead 
of the universal reader monad. also in the original, if anything arg 
suffixes should have been prefixes

Rejected [I'm afraid I didn't pay attention to/understand the 
discussion; please shout on the mailing list if it's important]

Sat Jan  2 09:54:26 PST 2010  thomashartman1@gmail.com
  * make slurpPristine use absolute paths (no more need to wrap in 
getCurrrentDirectory/withCurrentDirectory)

Sat Jan  2 10:05:24 PST 2010  thomashartman1@gmail.com
  * remove Repository.Pristine dependency on Workaround module (since no 
longer need getCurrentDirectory)
History
Date User Action Args
2010-01-02 18:18:28thomashartman1create
2010-01-02 22:19:22thomashartman1setfiles: + in-show-files_-break-out-manifestcmd_-helper-function-with-result-type-io-_filepath__-as-baby-step-towards-a-useful-darcs-api.dpatch
messages: + msg9718
title: patches to Darcs.Commands.ShowFiles, and repository internals that -> Re: patches to Darcs.Commands.ShowFiles, and repository internals
2010-01-03 20:59:09ganeshsetnosy: + ganesh
assignedto: ganesh
2010-01-03 21:24:21ganeshsetstatus: needs-review -> review-in-progress
messages: + msg9720
2010-01-11 23:43:46mornfallsetnosy: + mornfall
messages: + msg9789
2010-01-12 23:20:34thomashartman1setmessages: + msg9803
2010-01-13 07:13:28mornfallsetmessages: + msg9807
2010-01-14 20:21:32thomashartman1setmessages: + msg9823
2010-03-18 02:46:57dagitsetnosy: + dagit
messages: + msg10252
2010-04-01 20:25:30duncansetnosy: + duncan
messages: + msg10613
2010-04-01 20:47:48ganeshsetissues: + darcs library is not thread-safe (uses current working directory)
2010-04-02 10:30:23ganeshsetmessages: + msg10623
2010-04-02 10:33:40koweysetnosy: + kowey
messages: + msg10624
2010-04-03 13:06:04ganeshsetstatus: review-in-progress -> followup-requested
assignedto: ganesh -> thomashartman1
messages: + msg10653
2010-04-05 08:11:41thomashartman1setmessages: + msg10674
2012-01-01 00:33:46koweysetstatus: followup-requested -> obsoleted
nosy: - darcs-users
messages: + msg14933