darcs

Issue 777 put => malloc: resource exhausted (out of memory)

Title put => malloc: resource exhausted (out of memory)
Priority feature Status duplicate
Milestone Resolved in
Superseder apply: memory usage 10X bundle size, send memory usage 7X bundle size
View: 1539, 1540
Nosy List Serware, alex1, darcs-devel, dmitry.kurochkin, gwern, kowey, thorkilnaur, tommy, tux_rocker
Assigned To
Topics Performance

Created on 2008-04-01.23:04:33 by alex1, last changed 2009-10-23.23:43:07 by admin.

Files
File name Uploaded Type Edit Remove
darcs-100patches10KiB.ps tux_rocker, 2008-05-18.11:57:35 application/postscript
darcs-100patches10KiBand1patch15MiB.ps tux_rocker, 2008-05-18.11:57:58 application/postscript
darcs.ps tux_rocker, 2008-05-18.09:15:59 application/postscript
Messages
msg4167 (view) Author: alex1 Date: 2008-04-01.23:04:30
Darcs output:

$ time darcs put [...]
darcs(28475) malloc: *** mmap(size=1152278528) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
darcs: malloc: resource exhausted (out of memory)


real	2m25.229s
user	1m35.057s
sys	0m6.252s

Darcs 2 compiled with GHC 6.8.2:

$ darcs -v
2.0.0rc1 (unknown)
msg4175 (view) Author: tux_rocker Date: 2008-04-02.14:05:40
Could you give us a link to the repository on which darcs fails? If that's not
possible, can you give the size of the repository in megabytes and in number of
patches? And how big is the largest patch?
msg4176 (view) Author: alex1 Date: 2008-04-02.14:32:16
On 4/2/08, Reinier Lamers <bugs@darcs.net> wrote:
>  Could you give us a link to the repository on which darcs fails? If that's not
>  possible, can you give the size of the repository in megabytes and in number of
>  patches? And how big is the largest patch?

This is a commercial project, so no repo for you. But I can give you the rest:

* Size of repository including working directory: 181M
* Size of _darcs: 139M
* Number of patches: 6475
* Largest patch is 8MB compressed, 15MB decompressed.

The repo was created by cloning a Darcs 1 repo. So the exact sequence was:

$ darcs get ...
$ darcs put ...

Let me know if there is anything else I can do or provide.

Alexander.
msg4619 (view) Author: tux_rocker Date: 2008-05-09.18:18:38
I made a test repo with 6476 patches of which one is a binary patch of 15 MB.

When I darcs put on that, I get the following output:

777test reinier$ darcs put --debug-verbose ../777test-put
Creating repository
Beginning identifying repository .
Done identifying repository .
Beginning identifying repository .
Done identifying repository .
Identified darcs-1 repo: /Users/reinier/Documents/Programs/777test
Beginning reading inventory of repository /Users/reinier/Documents/Programs/777test
Done reading inventory of repository /Users/reinier/Documents/Programs/777test
darcs apply --all --repodir ../777test-put --debug
^C^CwithSignalsHandled: Interrupted!

Darcs tells us about the repository:

777test reinier$ darcs query repo
          Type: darcs
        Format: darcs-1.0
          Root: /Users/reinier/Documents/Programs/777test
      Pristine: PlainPristine "_darcs/pristine"
         Cache: thisrepo:/Users/reinier/Documents/Programs/777test
Default Remote: ../777test-put
   Num Patches: 6476

So apparently we need to look at "darcs apply" here.
msg4705 (view) Author: tux_rocker Date: 2008-05-14.18:22:45
This goes wrong because the "darcs apply" command first reads the whole patch
bundle from stdin, and only when all the input has been read, starts applying
it. This is because hGetContentsPS is strict. Am I right that hGetContentsPS
does not have, and is not supposed to have, the lazy I/O behavior of the
standard getContents?
msg4706 (view) Author: kowey Date: 2008-05-14.18:25:04
Adding David and Gwern to this bug.  Can either of you two answer Reiner's
question?  Jason?

> Am I right that hGetContentsPS does not have, and is not supposed to have,
> the lazy I/O behavior of the standard getContents?
msg4710 (view) Author: gwern Date: 2008-05-15.04:19:19
'getContents' is usually lazy, yes (the Prelude getContents is lazy, System.IO's
hGetContents is lazy, etc.).

But FastPackedString.hs *always* uses strict ByteStrings (and
OldFastPackedStrings.hs is strict as well, I think, from the looks of it). 

http://haskell.org/ghc/docs/latest/html/libraries/bytestring/Data-ByteString-Char8.html#v%3AhGetContents

"Read entire handle contents into a ByteString. This function reads chunks at a
time, doubling the chunksize on each read. The final buffer is then realloced to
the appropriate size. For files > half of available memory, this may lead to
memory exhaustion. Consider using readFile in this case."

Not lazy. If we're getting errors, perhaps one could try taking its advice about
readFile.
msg4730 (view) Author: tux_rocker Date: 2008-05-16.21:16:40
So if we use strict bytestrings, that limits the size of the patch bundle to
available working memory. To do something about that, you'd have to read the
patches one by one, I suppose. Are there some patch theoretical objections to
doing that?
msg4732 (view) Author: dagit Date: 2008-05-16.21:33:59
On Fri, May 16, 2008 at 2:16 PM, Reinier Lamers <bugs@darcs.net> wrote:

>
> Reinier Lamers <tux_rocker@reinier.de> added the comment:
>
> So if we use strict bytestrings, that limits the size of the patch bundle
> to
> available working memory. To do something about that, you'd have to read
> the
> patches one by one, I suppose. Are there some patch theoretical objections
> to
> doing that?

No, I don't think there is any patch theoretic problem here.  In the case of
patch application, I think it's just as straightforward as reading patches
and modifying the working copy (with perhaps some in memory optimizations, I
don't know because I haven't looked at this code).

But, as I understand it, there can be problems with lazily reading files
that get modified while the read is happening.  Also, the code internally is
easier to verify and maintain if darcs assumes it can hold all the needed
patches in memory at once.

I recently asked Gwern if he could make --use-mmap (or --lazyio) or similar
a command line option so that the user can choose at run-time which way they
want darcs to do reading.  I suspect there are corner cases where using
either lazy or strict IO is not safe but prefered due to resource usage.  It
could be that using mmap is dangerous in some cases (darcs could get a
segfault if a file is modified during reading, or lazy IO could result in a
file descriptor shortage) but that it may be the only way to make a specific
command work in some cases.  For these situations it would be nice if the
user could pick between the IO strategies.  Of course the draw back is the
added complexity.

Jason
Attachments
msg4733 (view) Author: tux_rocker Date: 2008-05-16.21:38:03
It turns out the size of the patch bundle was a red herring. My system becomes
unusably swappy already when only 11 MB of it has been generated. So the problem
is with the code that generates the bundle. But I'm going to investigate that
tomorrow.
msg4743 (view) Author: tux_rocker Date: 2008-05-18.09:15:59
I'm attaching the heap profile of running darcs put on a repository with a
single 15 MiB binary patch (the first 15 MiB of my hard disk, actually). It
looks as if it's allocating the patch content on the heap three times, once in
reading the patch file and then twice in creating the bundle.
Attachments
msg4779 (view) Author: tux_rocker Date: 2008-05-19.19:33:40
It seems that hash_bundle is allocating some voluptuous heap space to take a
hash of. I'm going to try to modify that function so that it doesn't need to
hold all the data in memory at once.
msg4829 (view) Author: droundy Date: 2008-05-22.10:31:59
darcs put requires that the entire parsed repository be held in memory.  For any
reasonably-large repository, this is impossible.  Changing this requires a
complete rewrite of put.
msg7059 (view) Author: thorkilnaur Date: 2009-01-12.11:10:34
tux_rocker, you have assigned this issue to yourself: Is there anything that you 
wish to add to this issue?

Thanks and best regards
Thorkil
msg8388 (view) Author: kowey Date: 2009-08-23.10:47:00
So if I understand correctly, this can be broken down into generating a giant
patch bundle (issue1540) and consuming it with darcs apply (issue1539).

I recommend we move the discussion over to those bugs.
History
Date User Action Args
2008-04-01 23:04:33alex1create
2008-04-02 14:05:42tux_rockersetstatus: unread -> unknown
nosy: + tux_rocker
messages: + msg4175
2008-04-02 14:32:18alex1setnosy: + serware
messages: + msg4176
2008-04-25 17:43:26koweysettopic: + Performance
nosy: + serware, - serware
2008-05-09 18:18:40tux_rockersetnosy: + dagit
messages: + msg4619
2008-05-14 08:40:47koweysettopic: - Mac
nosy: tommy, beschmi, kowey, dagit, alex1, tux_rocker, serware, Serware
title: Put runs out of memory -> put => malloc: resource exhausted (out of memory)
2008-05-14 18:01:46tux_rockersetnosy: tommy, beschmi, kowey, dagit, alex1, tux_rocker, serware, Serware
assignedto: tux_rocker
2008-05-14 18:22:46tux_rockersetnosy: tommy, beschmi, kowey, dagit, alex1, tux_rocker, serware, Serware
messages: + msg4705
2008-05-14 18:25:06koweysetnosy: + droundy, gwern
messages: + msg4706
2008-05-15 04:19:21gwernsetnosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4710
2008-05-16 21:16:42tux_rockersetnosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4730
2008-05-16 21:34:01dagitsetfiles: + unnamed
nosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4732
2008-05-16 21:38:06tux_rockersetnosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4733
2008-05-18 09:13:05tux_rockersetfiles: - unnamed
nosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
2008-05-18 09:16:00tux_rockersetfiles: + darcs.ps
nosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4743
2008-05-18 11:57:36tux_rockersetfiles: + darcs-100patches10KiB.ps
nosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
2008-05-18 11:57:59tux_rockersetfiles: + darcs-100patches10KiBand1patch15MiB.ps
nosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
2008-05-19 19:33:42tux_rockersetnosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4779
2008-05-22 10:32:01droundysetnosy: droundy, tommy, beschmi, kowey, dagit, alex1, gwern, tux_rocker, serware, Serware
messages: + msg4829
2008-05-22 10:34:19droundysetpriority: bug -> feature
nosy: - droundy
2009-01-12 11:10:36thorkilnaursetnosy: + serware, dmitry.kurochkin, simon, thorkilnaur, - serware
messages: + msg7059
2009-08-06 21:01:40adminsetnosy: - beschmi
2009-08-11 00:09:30adminsetnosy: - dagit
2009-08-17 17:12:46koweysettopic: - Darcs2
nosy: tommy, kowey, simon, alex1, thorkilnaur, gwern, tux_rocker, dmitry.kurochkin, serware, Serware
2009-08-23 10:47:02koweysetstatus: unknown -> duplicate
nosy: tommy, kowey, simon, alex1, thorkilnaur, gwern, tux_rocker, dmitry.kurochkin, serware, Serware
superseder: + apply: memory usage 10X bundle size, send memory usage 7X bundle size
messages: + msg8388
assignedto: tux_rocker ->
2009-08-25 17:38:56adminsetnosy: + darcs-devel, - simon
2009-08-27 14:30:21adminsetnosy: tommy, kowey, darcs-devel, alex1, thorkilnaur, gwern, tux_rocker, dmitry.kurochkin, serware, Serware
2009-10-23 22:34:07adminsetnosy: + alexander, - alex1
2009-10-23 22:42:41adminsetnosy: - Serware
2009-10-23 23:28:01adminsetnosy: + Serware, - serware
2009-10-23 23:43:07adminsetnosy: + alex1, - alexander