darcs

Issue 755 darcs get --to-match "hash foo" inefficient

Title darcs get --to-match "hash foo" inefficient
Priority feature Status resolved
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, thorkilnaur, tommy, tux_rocker, vmiklos
Assigned To tux_rocker
Topics Performance

Created on 2008-03-22.02:52:46 by vmiklos, last changed 2009-08-27.14:18:03 by admin.

Messages
msg3957 (view) Author: vmiklos Date: 2008-03-22.02:52:45
how to reproduce:

$ darcs get http://ftp.frugalware.org/pub/archive/other/darcs/frugalware-current/
$ darcs get --to-match "hash 20050127214749-3bee8-696de7886abbb1b27960790a414aaeb138f7206b.gz" frugalware-current tmp
darcs: failed to read patch in get_extra:
Sun Jan 14 04:08:03 UTC 2007  crazy <crazy@frugalware.org>
  * ndiswrapper-1.34-1-i686
  * Version bump
Perhaps this is a 'partial' repository?

'ndiswrapper-1.34-1-i686' is
20070114040803-f6986-0b0716f3dbe5ab62512842837a38a2bb741927d5.gz, you
can check it yourself.

$ ls frugalware-current/_darcs/patches/20070114040803-f6986-0b0716f3dbe5ab62512842837a38a2bb741927d5.gz
frugalware-current/_darcs/patches/20070114040803-f6986-0b0716f3dbe5ab62512842837a38a2bb741927d5.gz

so it seem to be there.

thanks
msg3959 (view) Author: tux_rocker Date: 2008-03-22.11:45:43
This works for me with the current darcs 2 on Mac OS X 10.4 Tiger (it does take
an obscene amount of time, though). What version of darcs are you running?
msg3961 (view) Author: vmiklos Date: 2008-03-22.12:03:10
oh, i forgot to mention:

$ darcs --version
1.0.9 (release)

maybe a lame question: is (ideally) darcs2 fully backwards-comparible? ie may i
replace darcs with darcs2 without modifying my darcs scripts?

thanks
msg3962 (view) Author: tommy Date: 2008-03-22.14:11:02
With a current darcs-stable the error message is instead soething about
a resource limit exeded. (I lost my shell witht the exace error message.)
msg3977 (view) Author: vmiklos Date: 2008-03-23.23:10:04
ok, i tried now with darcs 2.0.0pre4, last time i checked the top output, it showed:

30133 vmiklos   20   0 1713m 806m  86m R 89.2 79.9  34:25.49 darcs

so 89% cpu, 1.7G swap (i have 2G) and 806M of memory (i have 1G).

i can leave it running for 2 days, hopefully i'll get an error message (or
success) till then.
msg3979 (view) Author: tux_rocker Date: 2008-03-24.13:55:22
My MacBook with 1G of RAM has managed to complete it in a couple of  
hours, so it seems safe to say that darcs 2 can do this. But I do  
wonder whether it is necessary to *first* get the whole repository  
and *then* throw away all but the first patch. Can't darcs just get  
the first patch and then realize that it's done?

Reinier
msg3981 (view) Author: vmiklos Date: 2008-03-24.14:25:34
On Mon, Mar 24, 2008 at 01:55:23PM -0000, Reinier Lamers <bugs@darcs.net> wrote:
> My MacBook with 1G of RAM has managed to complete it in a couple of  
> hours, so it seems safe to say that darcs 2 can do this. But I do  
> wonder whether it is necessary to *first* get the whole repository  
> and *then* throw away all but the first patch. Can't darcs just get  
> the first patch and then realize that it's done?

i think this is a design issue. though darcs2 finally did it here as
well:

22:29:12 [I] $ darcs get --quiet --to-match "hash 20050127214749-3bee8-696de7886abbb1b27960790a414aaeb138f7206b.gz" /home/vmiklos/tmp/frugalware-current /home/vmiklos/tmp/darcs 2>&1
05:12:54 [I] [Ok]

so it took almost 7 hours here on i686 with 1G of ram, which is not
acceptable imho.

thanks.
msg3985 (view) Author: droundy Date: 2008-03-24.16:18:51
This is mainly a situation of "don't do that".  We haven't optimized get --to-x
at all.  If you only want one patch, then pull it.

Of course, someone is welcome to implement a more efficient version of get.

David
msg3990 (view) Author: tux_rocker Date: 2008-03-24.22:02:09
I'm willing to take a stab at optimizing this, because I feel that it should work.

I can either try to make "darcs get" only the part of the repository you ask
for, or try to find a memory leak and fix that, or both.

David, do you think there may be a memory leak here? And what sort of approach
do you use for memory leaks in darcs (I have fixed laziness-related memory leaks
in Haskell before, but never in such big apps)?
msg3992 (view) Author: vmiklos Date: 2008-03-24.22:14:41
On Mon, Mar 24, 2008 at 10:02:10PM -0000, Reinier Lamers <bugs@darcs.net> wrote:
> I'm willing to take a stab at optimizing this, because I feel that it should work.
> 
> I can either try to make "darcs get" only the part of the repository you ask
> for, or try to find a memory leak and fix that, or both.

provided that there are about 25k patches in the repo, i think the
problem is that darcs tries to load all the patches to the memory.

in practice, i think all what would have to be done is to stop looking
at the inventory once there is a match. (but i'm not familiar with
darcs' internals.)
msg3996 (view) Author: droundy Date: 2008-03-25.13:46:45
On Mon, Mar 24, 2008 at 10:02:10PM -0000, Reinier Lamers wrote:
> I'm willing to take a stab at optimizing this, because I feel that it
> should work.
> 
> I can either try to make "darcs get" only the part of the repository you ask
> for, or try to find a memory leak and fix that, or both.
> 
> David, do you think there may be a memory leak here? And what sort of approach
> do you use for memory leaks in darcs (I have fixed laziness-related memory leaks
> in Haskell before, but never in such big apps)?

I doubt it's a memory leak, I think it's just what vmiklos said:  we're
holding the whole repository in memory.  And indeed, it may be that the fix
is pretty easy... except that I'm not sure how to ensure that the fix is
always an improvement.

Currently, we always get the entire repository and then obliterate the
patches we don't want.  This is intended to optimize for the common case
where you want to get a recent tag.  However, I'm not certain that with
hashed repositories it even does that.

Hmmm.  Actually, it really shouldn't leak... except that we're taking a
difference of two PatchSets (using get_common_and_uncommon), which is a
problem.  This is in Get, in go_to_chosen_version, and relates to the
choice to use the same code for get --tag and get --to-match.  The former
may require commutation to get the patches in the right order, but the
latter never can.

I think you might be successful just taking the output of get_one_patchset
and writing it to the repository.  This would require adding a new function
to Darcs.Repository, which would be implemented much like copyInventory,
which would look something like:

patchSetToRepository :: RepoPatch p => PatchSet p -> IO (Repository p)

which would just take a PatchSet and write it into a fresh repository
(overwriting any inventory or patches that might be present).  This would
work, and wouldn't be leaky, but would alwo involve parsing every patch in
the repository, which is a whole lot of needless work.  Actually, for
hashed repositories, it *wouldn't* require parsing of the patches, so maybe
you can figure that a performance regression for darcs1-format repositories
is worth it for the large memory-use improvement, and it may not seem
worthwhile adding extra complexity to handle the darcs1 case efficiently.
Also, over network connections, the CPU time may not matter... although
the common case for get --to-match is a local get.

Anyhow, the above patchSetToRepository function is how I'd improve this
get.  Modifying get itself would then be relatively easy.  Oh, and I'm not
sure whether or not we'd want to return a Repository from this function.  I
like the idea, though, as it allows you to easily obtain a Repository of
the right type that matches the repository that we're actually getting.
-- 
David Roundy
Department of Physics
Oregon State University
msg4008 (view) Author: vmiklos Date: 2008-03-25.21:03:29
On Mon, Mar 24, 2008 at 04:18:52PM -0000, David Roundy <bugs@darcs.net> wrote:
> This is mainly a situation of "don't do that".  We haven't optimized get --to-x
> at all.  If you only want one patch, then pull it.

just tried:

$ mkdir tmp
$ cd tmp
$ darcs init
$ darcs pull --match "hash 20050127214749-3bee8-696de7886abbb1b27960790a414aaeb138f7206b.gz" \
        /path/to/frugalware-current

eats the same amount of memory as well.

at least it seems.
msg4010 (view) Author: droundy Date: 2008-03-25.21:20:52
On Tue, Mar 25, 2008 at 09:03:30PM -0000, Miklos Vajna wrote:
> On Mon, Mar 24, 2008 at 04:18:52PM -0000, David Roundy <bugs@darcs.net> wrote:
> > This is mainly a situation of "don't do that".  We haven't optimized get --to-x
> > at all.  If you only want one patch, then pull it.
> 
> just tried:
> 
> $ mkdir tmp
> $ cd tmp
> $ darcs init
> $ darcs pull --match "hash 20050127214749-3bee8-696de7886abbb1b27960790a414aaeb138f7206b.gz" \
>         /path/to/frugalware-current
> 
> eats the same amount of memory as well.
> 
> at least it seems.

Oh, that could be.  In which case you'd have to get and then obliterate
(probably in multiple steps).
-- 
David Roundy
Department of Physics
Oregon State University
msg4015 (view) Author: tux_rocker Date: 2008-03-25.23:46:10
> Oh, that could be.  In which case you'd have to get and then obliterate
> (probably in multiple steps).

And in which case I'd probably better optimize pull first instead of get.
msg4016 (view) Author: vmiklos Date: 2008-03-26.01:37:00
On Tue, Mar 25, 2008 at 11:46:11PM -0000, Reinier Lamers <bugs@darcs.net> wrote:
> And in which case I'd probably better optimize pull first instead of get.

that would be nice. with this repo, pulling a single patch takes 2.5+
minutes:

02:14:50 [I] /home/vmiklos/tmp/darcs $ darcs pull --all --quiet --match "hash 20050127220056-3bee8-31fa158c59683569872ae11d9039d8d9b6c3acf2.gz" 2>&1
02:17:12 [I] [Ok]

if i create a wrapper to create a bundle containging only that patch and
then i use darcs apply, that's much faster. but it's just a workaround,
ideally i think darcs pull --match "hash foo" could be as fast as darcs
apply is.
msg4021 (view) Author: droundy Date: 2008-03-26.13:54:21
On Tue, Mar 25, 2008 at 11:46:11PM -0000, Reinier Lamers wrote:
> > Oh, that could be.  In which case you'd have to get and then obliterate
> > (probably in multiple steps).
> 
> And in which case I'd probably better optimize pull first instead of get.

No, get's actually much easier to optimize in this case, since it doesn't
allow interactive patch selection.  Interactive patch selection only works
by holding all the patches in memory, and pull doesn't have a special case
for non-interactive patch selection.  Optimizing interactive patch
selection (SelectChanges) to take advantage of situations like this where a
filter is applied to the set of patches could help a lot of darcs commands
(since most of them are interactive and support flags like --patch and
--match), but the SelectChanges is amazingly ugly and fragile to modify,
and also needs eventually to be rewritten to take advantage of darcs-2
semantics (when available).  So I don't think it's a good place to start.
Better to handle the simple case first, and only later to deal with that
monstrosity.
-- 
David Roundy
Department of Physics
Oregon State University
msg4041 (view) Author: tux_rocker Date: 2008-03-26.23:38:23
I don't really understand what you mean here. Most of the current  
code for get is organized so that it modifies the current directory  
with functions that just return IO (). In what way does returning the  
repository we are getting to help in here? Most of the current code  
constructs that Repository on the spot with a "withRepository opts $-  
\repository -> ...".

I'm reading this code for the first time, so it wouldn't surprise me  
if I were confusing things here.

Reinier
msg4042 (view) Author: tux_rocker Date: 2008-03-26.23:45:18
Some program is stripping quoted text that I send to the bug  
tracker... does anyone of you know of the top of your head which  
program that is?

Reinier
msg4045 (view) Author: droundy Date: 2008-03-27.00:12:05
On Wed, Mar 26, 2008 at 11:38:25PM -0000, Reinier Lamers wrote:
> I don't really understand what you mean here. Most of the current  
> code for get is organized so that it modifies the current directory  
> with functions that just return IO (). In what way does returning the  
> repository we are getting to help in here? Most of the current code  
> constructs that Repository on the spot with a "withRepository opts $-  
> \repository -> ...".
> 
> I'm reading this code for the first time, so it wouldn't surprise me  
> if I were confusing things here.

There's no real need to return a Repository, but the advantage of doing so
would be that we could eliminate a withRepository and potentially add a bit
of typesafety:  the type of the new "gotten" repository must be the same as
the type of the original repository.  I seem to recall some nasty hacking
in copyRepository to deal with this constraint.
-- 
David Roundy
Department of Physics
Oregon State University
msg4057 (view) Author: tux_rocker Date: 2008-03-27.21:29:47
Okay, I have now thrown together some code that handles "get --to- 
match=blah" by getting a patchset from the remote repository and then  
adding and applying all those patches.

The offending command in the original bug report now completes almost  
instantly. But when I try to 'get --to-match' to the one-but-last  
patch in that large repo, it takes about 7 minutes CPU time and more  
than half a Gig of RAM. That's still less resources than 'get --to- 
match'ing the first patch took, but it's quite a regression compared  
to the behavior of the old code. And 'get --to-match'ing to a recent  
version seems more common than 'get --to-match'ing to a really old  
version.

Also, a 'darcs changes' or 'darcs whatsnew' in the freshly gotten  
repository takes really long - I haven't tried to let one complete,  
but they don't give any output in the first 20 seconds or so.

For your reference, here are the relevant changes to the code I made:
{
hunk ./src/Darcs/Commands/Get.lhs 31
-                                    SetScriptsExecutable, Quiet,  
Context ),
+                                    SetScriptsExecutable, Quiet,  
Context, OnePattern ),
hunk ./src/Darcs/Commands/Get.lhs 38
-                    tentativelyRemovePatches, patchSetToPatches,
-                    copyRepository, tentativelyAddToPending,
-                    finalizeRepositoryChanges, sync_repo )
-import Darcs.Repository.Format ( identifyRepoFormat,
+                          tentativelyRemovePatches,  
patchSetToPatches, patchSetToRepository,
+                          copyRepository, tentativelyAddToPending,
+                          finalizeRepositoryChanges, sync_repo )
+import Darcs.Repository.Format ( identifyRepoFormat, RepoFormat,
hunk ./src/Darcs/Commands/Get.lhs 154
+  if (not (null [p | OnePattern p <- opts]))
+    then withRepository opts $- \repository -> do
+      fromrepo <- identifyRepositoryFor  repository repodir
+      torepo <- get_one_patchset fromrepo opts >>=  
patchSetToRepository opts
+      return ()
+    else copy_repo_and_go_to_chosen_version opts repodir rfsource rf  
putInfo
+        where am_informative = not $ Quiet `elem` orig_opts
+              putInfo s = when am_informative $ putDocLn s
+
+get_cmd _ _ = fail "You must provide 'get' with either one or two  
arguments."
+
+-- called by get_cmd
+-- assumes that the target repo of the get is the current directory,  
and that an inventory in the
+-- right format has already been created.
+copy_repo_and_go_to_chosen_version :: [DarcsFlag] -> String ->  
RepoFormat -> RepoFormat -> (Doc -> IO ()) -> IO ()
+copy_repo_and_go_to_chosen_version opts repodir rfsource rf putInfo  
= do
hunk ./src/Darcs/Commands/Get.lhs 189
-      where am_informative = not $ Quiet `elem` orig_opts
-            putInfo s = when am_informative $ putDocLn s
hunk ./src/Darcs/Commands/Get.lhs 190
-get_cmd _ _ = fail "You must provide 'get' with either one or two  
arguments."
hunk ./src/Darcs/Repository.lhs 39
+                    patchSetToRepository,
hunk ./src/Darcs/Repository.lhs 74
+import Data.Either(either, Either(..))
hunk ./src/Darcs/Repository.lhs 204
+
+-- | patchSetToRepository takes a patch set, and writes a new  
repository in the current directory
+--   that contains all the patches in the patch set. This function  
is used when 'darcs get'ing a
+--   repository with the --to-match flag.
+patchSetToRepository :: RepoPatch p => [DarcsFlag] -> PatchSet p ->  
IO (Repository p)
+patchSetToRepository opts ps = do
+    maybeRepo <- maybeIdentifyRepository opts "."
+    let noRepoError e = error ("patchSetToRepository: no repository  
in current dir: " ++ e)
+        repo@(Repo todir repopts rf2 (DarcsRepository pristine c)) =  
either noRepoError id maybeRepo
+        -- piasFL = FL of PatchInfoAnd's
+        piasFL = reverseRL $ concatRL ps
+    sequence_ $ mapFL (tentativelyAddPatch repo opts) piasFL
+    apply_patches opts piasFL
+    finalizeRepositoryChanges repo
+    return repo
+
}
msg4062 (view) Author: droundy Date: 2008-03-28.13:51:29
On Thu, Mar 27, 2008 at 09:29:49PM -0000, Reinier Lamers wrote:
> Okay, I have now thrown together some code that handles "get --to- 
> match=blah" by getting a patchset from the remote repository and then  
> adding and applying all those patches.

This is definitely a needlessly inefficient approach.  In particular,
tentativelyAddPatch is much less efficient than what you could be doing.
The advantage of this function (and of your code, as a result) is that it
is independent of the repository format.  The disadvantage is that being
independent of repository format has a cost, particularly when combined
with the fact that this function has to work in "generic" situations.

> The offending command in the original bug report now completes almost  
> instantly. But when I try to 'get --to-match' to the one-but-last  
> patch in that large repo, it takes about 7 minutes CPU time and more  
> than half a Gig of RAM. That's still less resources than 'get --to- 
> match'ing the first patch took, but it's quite a regression compared  
> to the behavior of the old code. And 'get --to-match'ing to a recent  
> version seems more common than 'get --to-match'ing to a really old  
> version.

Agreed, this is a serious regression.  But I'm confident you can do
better.  (See both above and below for suggestions...)

> Also, a 'darcs changes' or 'darcs whatsnew' in the freshly gotten  
> repository takes really long - I haven't tried to let one complete,  
> but they don't give any output in the first 20 seconds or so.

I suspect this probably relates to the state of pending in some way.  If
you have continued trouble with this, I could help you figure out what's
going on.  In particular, tentativelyAddPatch has some "interesting"
effects, which enable it to be used in Record, where any changes (of the
add/remove/replace genre) that are recorded have to be removed from the
sequence of pending changes.

> For your reference, here are the relevant changes to the code I made:
> {
> hunk ./src/Darcs/Commands/Get.lhs 31
> -                                    SetScriptsExecutable, Quiet,  
> Context ),
> +                                    SetScriptsExecutable, Quiet,  
> Context, OnePattern ),
> hunk ./src/Darcs/Commands/Get.lhs 38
> -                    tentativelyRemovePatches, patchSetToPatches,
> -                    copyRepository, tentativelyAddToPending,
> -                    finalizeRepositoryChanges, sync_repo )
> -import Darcs.Repository.Format ( identifyRepoFormat,
> +                          tentativelyRemovePatches, patchSetToPatches, patchSetToRepository,
> +                          copyRepository, tentativelyAddToPending,
> +                          finalizeRepositoryChanges, sync_repo )
> +import Darcs.Repository.Format ( identifyRepoFormat, RepoFormat,
> hunk ./src/Darcs/Commands/Get.lhs 154
> +  if (not (null [p | OnePattern p <- opts]))
> +    then withRepository opts $- \repository -> do
> +      fromrepo <- identifyRepositoryFor  repository repodir
> +      torepo <- get_one_patchset fromrepo opts >>= patchSetToRepository opts
> +      return ()
> +    else copy_repo_and_go_to_chosen_version opts repodir rfsource rf  
> putInfo
> +        where am_informative = not $ Quiet `elem` orig_opts
> +              putInfo s = when am_informative $ putDocLn s
> +
> +get_cmd _ _ = fail "You must provide 'get' with either one or two  
> arguments."
> +
> +-- called by get_cmd
> +-- assumes that the target repo of the get is the current directory,  
> and that an inventory in the
> +-- right format has already been created.
> +copy_repo_and_go_to_chosen_version :: [DarcsFlag] -> String ->  
> RepoFormat -> RepoFormat -> (Doc -> IO ()) -> IO ()
> +copy_repo_and_go_to_chosen_version opts repodir rfsource rf putInfo  
> = do
> hunk ./src/Darcs/Commands/Get.lhs 189
> -      where am_informative = not $ Quiet `elem` orig_opts
> -            putInfo s = when am_informative $ putDocLn s
> hunk ./src/Darcs/Commands/Get.lhs 190
> -get_cmd _ _ = fail "You must provide 'get' with either one or two  
> arguments."
> hunk ./src/Darcs/Repository.lhs 39
> +                    patchSetToRepository,
> hunk ./src/Darcs/Repository.lhs 74
> +import Data.Either(either, Either(..))
> hunk ./src/Darcs/Repository.lhs 204
> +
> +-- | patchSetToRepository takes a patch set, and writes a new repository in the current directory
> +--   that contains all the patches in the patch set. This function is used when 'darcs get'ing a
> +--   repository with the --to-match flag.
> +patchSetToRepository :: RepoPatch p => [DarcsFlag] -> PatchSet p -> IO (Repository p)
> +patchSetToRepository opts ps = do
> +    maybeRepo <- maybeIdentifyRepository opts "."
> +    let noRepoError e = error ("patchSetToRepository: no repository in current dir: " ++ e)
> +        repo@(Repo todir repopts rf2 (DarcsRepository pristine c)) = either noRepoError id maybeRepo
> +        -- piasFL = FL of PatchInfoAnd's
> +        piasFL = reverseRL $ concatRL ps
> +    sequence_ $ mapFL (tentativelyAddPatch repo opts) piasFL
> +    apply_patches opts piasFL
> +    finalizeRepositoryChanges repo
> +    return repo

The first problem here is that you use piasFL twice.  You never want to use
any large lazily-read data twice (and we lazily read all large data, where
"large" means more than a single patch) if you can avoid it, and here you
can.  So you should re-read the repository contents before doing the
apply_patches.  This is your memory leak, you're requiring that the entire
repository be held in memory, since you use it twice.  Rereading should be
fast, provided you read it from the local repository.

Really, patchSetToRepository should be implemented in a
repository-format-specific manner, and shouldn't use the general functions
such as tentativelyAddPatch or finalizeRepositoryChanges.  You should be
able to look at copyInventory and copyFullRepository to get some idea how
to implement this efficiently.  Basically, we need to write out the
inventory (either hashed_inventory or inventory), and then need to copy
over whatever patches we need).  Once that is done, then we read the
newly-constructed repository, and apply all its patches to the pristine
cache, and then copy the pristine cache to the working directory.  The key
is that each of these operations can be done in O(1) memory (which is to
say, without holding the contents of more than one patch in memory... we'll
hold the PatchInfos of the entire repository in memory, but that's no
problem).

It has occurred to me that you may actually want to move this into
Repository.Internal, but I'm not sure.  I like to keep all of the Repository
implementation that we can in Internal (which relates to the avoidance of
circular dependencies).
-- 
David Roundy
Department of Physics
Oregon State University
msg4216 (view) Author: tux_rocker Date: 2008-04-09.19:43:20
I finally got around to work on this again, and I now implemented what you told
me to do :-) for the darcs1-to-darcs1 case.

Results for getting --to-match the one-but-last patch of the repository:
real    2m34.641s
user    1m8.520s
sys     0m46.955s

And for the first:
real    0m0.878s
user    0m0.840s
sys     0m0.028s

And when just getting the whole thing without --to-match:
real    1m17.264s
user    0m28.706s
sys     0m32.746s

All of these commands use a constant amount of memory around 50MB (but a simple
get slightly more than a get --to-match on the one-but-last patch, which in turn
uses slightly more than a get --to-match on the first patch).
msg4217 (view) Author: vmiklos Date: 2008-04-09.19:53:48
On Wed, Apr 09, 2008 at 07:43:22PM -0000, Reinier Lamers <bugs@darcs.net> wrote:
> I finally got around to work on this again, and I now implemented what
> you told me to do :-) for the darcs1-to-darcs1 case.

great!

can i test it somehow? :)

also, am i right about that i should NOT use darcs convert if i want to
test your patch?

thanks
msg4230 (view) Author: tux_rocker Date: 2008-04-12.20:52:11
I only have working code for the case of a darcs 1 repository without --partial.
If you really want it, I can give it to you of course.

I notice that when I convert the frugalware-current repository to darcs-2
format, darcs seems to have a better (or less bad) performance on the get to the
first patch. It completes in under 20 minutes, using only 140 MB of memory.
msg4859 (view) Author: vmiklos Date: 2008-05-25.23:31:52
On Sat, Apr 12, 2008 at 08:52:13PM -0000, Reinier Lamers <bugs@darcs.net> wrote:
> I only have working code for the case of a darcs 1 repository without --partial.
> If you really want it, I can give it to you of course.

Please do, I'm interested.

> I notice that when I convert the frugalware-current repository to darcs-2
> format, darcs seems to have a better (or less bad) performance on the get to the
> first patch. It completes in under 20 minutes, using only 140 MB of memory.

Hm, it does not works that fine here:

01:03:39 [I] $ darcs get --quiet --to-match "hash
20050127214749-3bee8-696de7886abbb1b27960790a414aaeb138f7206b.gz"
/home/vmiklos/tmp/frugalware-current /home/vmiklos/tmp/darcs 2>&1
01:29:05 [W] [Status -9]

And yes, this is a darcs-2 repo.

Based in dmesg, OOM killed darcs. The box has 1G of memory.

Thanks.
msg4861 (view) Author: tux_rocker Date: 2008-05-26.10:39:32
Op 26-mei-2008, om 1:31 heeft Miklos Vajna het volgende geschreven:

The code for the case of a darcs 1 repository without --partial is  
already in the current darcs repository. Just 'darcs get http:// 
darcs.net' and you have it.

I'll take a look when I'm under Linux again (OS X never OOM-kills  
anything, just becomes unusably slow due to swapping, it seems).
msg4865 (view) Author: vmiklos Date: 2008-05-26.15:42:30
On Mon, May 26, 2008 at 10:39:35AM -0000, Reinier Lamers <bugs@darcs.net> wrote:
> The code for the case of a darcs 1 repository without --partial is  
> already in the current darcs repository. Just 'darcs get http:// 
> darcs.net' and you have it.

Ok, I'll give it a try.

> I'll take a look when I'm under Linux again (OS X never OOM-kills  
> anything, just becomes unusably slow due to swapping, it seems).

I worked it around by adding more swap. The problem is that now a darcs
pull --match hash takes about 12 minutes:

2008-05-26 15:03:44     INFO: /home/vmiklos/tmp/darcs $ darcs pull --all
--quiet --match "hash 20050327212330-dc098-fecfa60f
db5fcdf2045cf5d5c1e26010c0f89821.gz" 2>&1
2008-05-26 15:15:45     INFO: [Ok]

so that would take 7 months ;-)

After building darcs from darcs, it's faster but still slow:

17:39:34 [I] /home/vmiklos/tmp/darcs $ darcs pull --all --quiet --match
"hash 20050427220917-3bee8-96e0ceb55010f380d6c1488ecb052682bb9f3f46.gz"
2>&1
17:40:47 [I] [Ok]

so that would take about 22 days.

The strange fact is that if I use a wrapper and internally use darcs
apply, it's much faster:

http://vmiklos.hu/project/tailor/tailor-wrap/darcs

so I think there is some bug still somewhere.

Thanks.
msg6258 (view) Author: droundy Date: 2008-10-07.14:50:46
I think this is fixed with tux_rocker's patches...
History
Date User Action Args
2008-03-22 02:52:46vmikloscreate
2008-03-22 10:25:36tux_rockersetpriority: bug
nosy: tommy, beschmi, kowey, vmiklos
2008-03-22 10:25:52tux_rockersetnosy: + tux_rocker
2008-03-22 11:45:44tux_rockersetstatus: unread -> unknown
nosy: tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3959
2008-03-22 12:03:11vmiklossetnosy: tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3961
2008-03-22 14:11:04tommysetnosy: tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3962
2008-03-23 23:10:05vmiklossetnosy: tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3977
2008-03-24 13:55:23tux_rockersetnosy: tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3979
2008-03-24 14:25:35vmiklossetnosy: tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3981
2008-03-24 16:18:52droundysetpriority: bug -> feature
nosy: + droundy
messages: + msg3985
title: darcs get --to-match "hash foo" broken for large repos -> darcs get --to-match "hash foo" inefficient
2008-03-24 22:02:10tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3990
assignedto: tux_rocker
2008-03-24 22:14:44vmiklossetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3992
2008-03-25 13:46:49droundysetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg3996
2008-03-25 21:03:30vmiklossetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4008
2008-03-25 21:20:53droundysetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4010
2008-03-25 23:46:11tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4015
2008-03-26 01:37:03vmiklossetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4016
2008-03-26 13:54:24droundysetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4021
2008-03-26 23:38:25tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4041
2008-03-26 23:45:19tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4042
2008-03-27 00:12:06droundysetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4045
2008-03-27 21:29:49tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4057
2008-03-28 13:51:40droundysetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4062
2008-03-28 20:57:31droundylinkissue411 superseder
2008-03-28 20:57:44droundysettopic: + Performance
nosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
2008-04-08 06:56:03tux_rockersetstatus: unknown -> has-patch
nosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
2008-04-09 19:43:21tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4216
2008-04-09 19:53:51vmiklossetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4217
2008-04-12 20:52:13tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
messages: + msg4230
2008-04-22 19:49:41tux_rockersetstatus: has-patch -> resolved
nosy: droundy, tommy, beschmi, kowey, vmiklos, tux_rocker
2008-05-25 23:31:54vmiklossetstatus: resolved -> unknown
nosy: + dagit
messages: + msg4859
2008-05-26 10:39:35tux_rockersetnosy: droundy, tommy, beschmi, kowey, vmiklos, dagit, tux_rocker
messages: + msg4861
2008-05-26 15:42:33vmiklossetnosy: droundy, tommy, beschmi, kowey, vmiklos, dagit, tux_rocker
messages: + msg4865
2008-10-07 14:50:48droundysetstatus: unknown -> resolved-in-unstable
nosy: + dmitry.kurochkin, simon, thorkilnaur
messages: + msg6258
2009-04-22 03:28:30twbsetstatus: resolved-in-unstable -> resolved
nosy: droundy, tommy, beschmi, kowey, vmiklos, dagit, simon, thorkilnaur, tux_rocker, dmitry.kurochkin
2009-08-06 17:57:20adminsetnosy: + markstos, jast, Serware, darcs-devel, zooko, mornfall, - droundy, vmiklos, tux_rocker
2009-08-06 21:00:49adminsetnosy: - beschmi
2009-08-10 22:18:30adminsetnosy: + vmiklos, tux_rocker, - markstos, darcs-devel, zooko, jast, Serware, mornfall
2009-08-11 00:08:50adminsetnosy: - dagit
2009-08-25 17:30:31adminsetnosy: + darcs-devel, - simon
2009-08-27 14:18:03adminsetnosy: tommy, kowey, vmiklos, darcs-devel, thorkilnaur, tux_rocker, dmitry.kurochkin