darcs

Issue 162 wish: Avoid heap exhaustion when importing large repositories

Title wish: Avoid heap exhaustion when importing large repositories
Priority wishlist Status given-up
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, hejin, kowey, markstos, thorkilnaur, tommy
Assigned To
Topics Performance

Created on 2006-04-19.11:19:01 by hejin, last changed 2017-07-30.23:56:59 by gh.

Messages
msg621 (view) Author: hejin Date: 2006-04-19.11:18:59
When doing the initial recording of a large repostitory, I get an "Heap
exhausted" error. 

I can't read Haskell, so I'll have to do some guesswork here. It seems to me
that darcs builds up the patch in memory, making sure that dependencies are
allright, etc., before writing it to disk. For the initial patch, there
shouldn't be any dependencies to take care of, so the patch could just as well
be streamed to disk as you go, allowing for much larger repositories to be
darcs'ified. For the following patches, the current logic is fine, as they
typically won't that big.
msg2884 (view) Author: markstos Date: 2008-01-30.02:20:22
Hejin,

Thanks for the report. Could you try again with one of the new pre-built
binaries for Darcs 2, along with using "darcs init --darcs-2" so the new
repository format is used? 

Thanks, 

   Mark
msg2973 (view) Author: hejin Date: 2008-01-31.15:01:50
On Jan 30, 2008 3:20 AM, Mark Stosberg <bugs@darcs.net> wrote:
>
>
> Hejin,
>
> Thanks for the report. Could you try again with one of the new pre-built
> binaries for Darcs 2, along with using "darcs init --darcs-2" so the new
> repository format is used?

Hi Mark,

I can't see that the problem is even partially solved, although there
might be differences that I haven't noticed.

I created a directory structure of similar size to the one I worked on
two years ago when I filed the bug report: 200 MB. Did a 'darcs init',
followed by a 'darcs add -r' (took 19 sec's) and a 'darcs record'. The
last command resulted in darcs using up 50% of my CPU and 500 MB RAM
for more than 100 minutes - and it is still running. It wrote a chunk
of 95 MB to disc after 15 min, but hasn't written anything since

Let me restate my idea from back then. As the initial patch is not
dependent upon earlier patches, you can just concatenate the files'
content or zip it or whatever plus record some metadata (filenames,
paths, etc). This can be done without keeping everything in memory -
you could keep a reasonable sized buffer and stream content to a patch
file as you read it. There is not much book-keeping to do during the
recording of the initial patch (I may be guessing here). I know that
my proposal isn't particularly in line with functional programming,
but there must be a way to do it in Haskell.

Based on today's test, I would still conclude that darcs doesn't scale
well wrt large projects. You should state that more clearly in the
FAQ. There are a few workarounds, of course. Like doing the initial
import in small increments, say 10 MB at a time, but for a project of
200 MB, it is still very annoying.

Regards,

Hedin Meitil

>
> Thanks,
>
>    Mark
>
> ----------
> nosy: +markstos
> status: unread -> need-eg
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue162>
> __________________________________
>
msg4077 (view) Author: droundy Date: 2008-03-28.19:44:56
Just to clarify here.  We did at one time support lazy operation in record (the
key was that you had to use --all, or specify 'a' at the beginning of the
interactive prompt), and it's possible to do so.  However, somewhere along the
way this feature broke, and I'm just as glad that it did.  I don't like the idea
of darcs creating patches that it can't hold in memory, as it *very* severely
limits what you can do with them, and I'd rather our users don't get stuck in a
situation where darcs has created a patch so big that it can't lift it.
msg4110 (view) Author: hejin Date: 2008-03-31.10:45:43
On Fri, Mar 28, 2008 at 9:44 PM, David Roundy <bugs@darcs.net> wrote:
>
>
>  Just to clarify here.  We did at one time support lazy operation in record (the
>  key was that you had to use --all, or specify 'a' at the beginning of the
>  interactive prompt), and it's possible to do so.  However, somewhere along the
>  way this feature broke, and I'm just as glad that it did.  I don't like the idea
>  of darcs creating patches that it can't hold in memory, as it *very* severely
>  limits what you can do with them, and I'd rather our users don't get stuck in a
>  situation where darcs has created a patch so big that it can't lift it.

Just to clarify. I am only referring to the initial patch. The one
that corresponds to an 'import' in e.g. Subversion. As there is no
processing to take place during the creation of the initial patch,
there is no need to keep everything in memory.

Conceptually, I know that darcs is capable of applying patches in any
order, but I can't imagine a situation where you would apply the
initial patch after one of its successors. So, yes, I don't see all
patches as being equal. To me, it is meaningful to see the initial
patch as being different.

But it is your choice. I will continue to use darcs for my private
projects. I will however not suggest it for use at work, as it cannot
be used for projects with a large existing codebase, although I would
very much like to. Both with my current and a previous employer, I've
been in a position to suggest version control systems, but had to
disregard darcs for exactly this reason.

Regards,

Hedin Meitil

>  ----------
>  status: deferred -> wont-fix
>
>  __________________________________
>  Darcs bug tracker <bugs@darcs.net>
>  <http://bugs.darcs.net/issue162>
>  __________________________________
>
msg8905 (view) Author: dagit Date: 2009-10-02.21:47:24
This came up again here:
http://lists.osuosl.org/pipermail/darcs-users/2009-October/021705.html

And Ian considered it a regression.

Darcs 2.3.0 on an old-fashioned-inventory on my machine took almost 2GB of
memory (see later messages in that thread from me).  So while we may not want to
fix this by allowing darcs to record patches that it cannot later work with, we
should do something about this issue in general.

I have ideas about making this better but no plan of attack (yet).  I'm still
assigning it to myself for now so that I don't lose track of it and because I
would like to keep hacking away on it till things improve substantially.

I have some notes about part of the problem in this thread:
http://lists.osuosl.org/pipermail/darcs-users/2009-September/021158.html
History
Date User Action Args
2006-04-19 11:19:01hejincreate
2007-07-20 20:32:05koweysettopic: + Performance
nosy: + beschmi, kowey
2008-01-30 02:20:23markstossetstatus: unread -> waiting-for
nosy: + markstos
messages: + msg2884
2008-01-31 15:01:51hejinsetnosy: droundy, tommy, beschmi, kowey, markstos, hejin
messages: + msg2973
2008-02-16 23:06:30markstossetstatus: waiting-for -> deferred
nosy: droundy, tommy, beschmi, kowey, markstos, hejin
title: Avoid heap exhaustion when importing large repositories -> wish: Avoid heap exhaustion when importing large repositories
2008-03-28 19:44:57droundysetstatus: deferred -> wont-fix
nosy: droundy, tommy, beschmi, kowey, markstos, hejin
messages: + msg4077
2008-03-31 10:45:44hejinsetnosy: droundy, tommy, beschmi, kowey, markstos, hejin
messages: + msg4110
2009-08-06 17:42:29adminsetnosy: + jast, Serware, dmitry.kurochkin, darcs-devel, zooko, dagit, mornfall, simon, thorkilnaur, - droundy, hejin
2009-08-06 20:39:21adminsetnosy: - beschmi
2009-08-10 21:44:59adminsetnosy: + hejin, - darcs-devel, zooko, jast, dagit, Serware, mornfall
2009-08-25 17:55:49adminsetnosy: + darcs-devel, - simon
2009-08-27 13:57:16adminsetnosy: tommy, kowey, markstos, darcs-devel, hejin, thorkilnaur, dmitry.kurochkin
2009-10-02 21:47:27dagitsetstatus: wont-fix -> has-patch
nosy: + dagit
messages: + msg8905
assignedto: dagit
2010-05-01 23:13:47dagitsetnosy: - dagit
assignedto: dagit ->
2017-07-30 23:56:59ghsetstatus: has-patch -> given-up