darcs

Issue 80 memory usage is up to 20X patch size

Title memory usage is up to 20X patch size
Priority urgent Status needs-testcase
Milestone Resolved in
Superseder wish: "chunky" representation for hunks
View: 1357
Nosy List Serware, darcs-devel, dmitry.kurochkin, gwern, kowey, lukeworth, markstos, thorkilnaur, tommy, zooko
Assigned To
Topics Performance

Created on 2006-01-04.17:14:56 by zooko, last changed 2020-07-30.17:55:28 by bfrk.

Messages
msg300 (view) Author: zooko Date: 2006-01-04.17:14:55
I created a 0.5 GiB file and darcs added it, darcs consumes 1.5 GiB of RAM
during the darcs add and again during the darcs record.  It also took 35
minutes of maximum CPU on my high-powered workstation before I killed it.

It would be nice if darcs required enough RAM to store "only" one copy of the
patch.  It would be nicer if darcs required less RAM -- using instead a fixed
maximum buffer of RAM and lazily processing the file as needed.

Hopefully the fact that laziness is one of the oldest, core features of the
design of Haskell means that it is relatively easy for programmers to implement
algorithms that do not eagerly consume RAM ?

The extreme CPU usage is perplexing.  Are we trying to match the entire
contents of the binary file against a regex or something?

Regards,

Zooko

DARC yumyum:/mnt/sdb1/zooko/tmp$ time head --bytes=`python -c 'print 2**29'` /dev/zero > 0.5_GiB_file_a

real    0m1.790s
user    0m0.116s
sys     0m0.820s
DARC yumyum:/mnt/sdb1/zooko/tmp$
DARC yumyum:/mnt/sdb1/zooko/tmp$ l
drwxr-xr-x  5 zooko zooko       200 Jan  4 12:29 ./..
drwxrwxr-x  6 zooko zooko       184 Jan  4 12:36 ./_darcs
drwxrwxr-x  3 zooko zooko       104 Jan  4 12:36 ./.
-rw-rw-r--  1 zooko zooko 536870912 Jan  4 12:36 ./0.5_GiB_file_a
DARC yumyum:/mnt/sdb1/zooko/tmp$ darcs add 0.5_GiB_file_a
DARC yumyum:/mnt/sdb1/zooko/tmp$ time darcs record
addfile ./0.5_GiB_file_a
Shall I record this patch? (1/?) [ynWsfqadjkc], or ? for help: y
binary ./0.5_GiB_file_a
Shall I record this patch? (2/?) [ynWsfqadjkc], or ? for help: y
What is the patch name? a
Do you want to add a long comment? [yn] n

Couldn't handle interrupt since darcs was in a sensitive job.
Couldn't handle interrupt since darcs was in a sensitive job.
Finished recording patch 'a'

real    35m49.529s
user    34m33.986s
sys     0m19.432s
DARC yumyum:/mnt/sdb1/zooko/tmp$
msg301 (view) Author: zooko Date: 2006-01-04.17:18:19
Hm.  I just noticed the "Finished record patch" part.  That seems like a
separate bug.  I hit C-c after 35 minutes.

--Z

> DARC yumyum:/mnt/sdb1/zooko/tmp$ darcs add 0.5_GiB_file_a
> DARC yumyum:/mnt/sdb1/zooko/tmp$ time darcs record
> addfile ./0.5_GiB_file_a
> Shall I record this patch? (1/?) [ynWsfqadjkc], or ? for help: y
> binary ./0.5_GiB_file_a
> Shall I record this patch? (2/?) [ynWsfqadjkc], or ? for help: y
> What is the patch name? a
> Do you want to add a long comment? [yn] n
> 
> Couldn't handle interrupt since darcs was in a sensitive job.
> Couldn't handle interrupt since darcs was in a sensitive job.
> Finished recording patch 'a'
> 
> real    35m49.529s
> user    34m33.986s
> sys     0m19.432s
> DARC yumyum:/mnt/sdb1/zooko/tmp$
msg2951 (view) Author: markstos Date: 2008-01-31.04:03:24
I confirmed this issue with Darcs2 and the --darcs-2 format tonight, although
the memory usage reported was "only" twice the patch size. After 'record' ran
about about 2 minutes on a 1 Ghz laptop, darcs bailed out with this error:

darcs: out of memory (requested 1074790400 bytes)

(That's about 1 Gig of Ram being requested.)

If the reason the files are being loaded into memory is to check for changes, it
seems like some special case improvements are possible:

- If it's an "add", of course the whole file is new. Maybe we can avoid loading
the as much in this case?
- If it's a binary file, there's no need to look inside it, just to notice that
it changed, right?
msg2981 (view) Author: droundy Date: 2008-01-31.16:26:55
We have previously had special-case code to enable record to run with less
memory, but I'm not convinced this is a good idea.  By design, some of darcs
operations require that we hold parsed patches in memory, which will always
require more memory than the actual patch.  And I don't care for the idea of
allowing users to record a patch that they cannot unrecord.

The real solution here is to revamp our handling of hunk patches so that we
don't store them in memory as a list of lines, but instead as a solid chunk of
memory with a stored number of lines.

David
msg3503 (view) Author: markstos Date: 2008-02-16.19:03:06
David has been working on the fix to to replace the in-memory storage of hunks
to be blob-based rather than line-based.

However, the work caused some regressions, so it is being paused now while we
work on a stable Darcs 2 release. I'm marking the Darcs 2 release as a superseder.
msg4675 (view) Author: kowey Date: 2008-05-14.13:03:10
Ok, darcs-2 is released, so I'm reviving this performance bug.
msg8416 (view) Author: kowey Date: 2009-08-23.18:02:42
Perhaps profiling would also be useful here?
msg22309 (view) Author: bfrk Date: 2020-07-30.17:54:00
Our oldest open issue and it is still a problem. I found it can actually be as much as 20 
times with slightly smaller files.

>head --bytes=`python -c 'print 2**25'` /dev/zero > large
>time darcs add large      
Adding './large'
Finished adding:
./large
0,02s 20M
>time darcs record -am 32M --skip-long-comment
Finished recording patch '32M'
3,57s 648M
>head --bytes=`python -c 'print 2**26'` /dev/zero > large
>time darcs record -am 64M --skip-long-comment  
Finished recording patch '64M'
10,23s 1560M
>head --bytes=`python -c 'print 2**27'` /dev/zero > large
>time darcs record -am 128M --skip-long-comment
Finished recording patch '128M'
20,25s 3093M
>time echo yd|darcs obliterate
patch 927735e394244c8f0f042d51ceeb7102cca13206
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jul 30 19:50:24 CEST 2020
  * 128M
Shall I obliterate this patch? (1/3)  [ynW...], or ? for more options: patch 
a7b056521a8e2041c61494fe5f15c18f97e48f78
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jul 30 19:49:41 CEST 2020
  * 64M
Shall I obliterate this patch? (2/3)  [ynW...], or ? for more options: Finished obliterating.
81,83s 3185M
>time echo yd|darcs obliterate
patch a7b056521a8e2041c61494fe5f15c18f97e48f78
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jul 30 19:49:41 CEST 2020
  * 64M
Shall I obliterate this patch? (1/2)  [ynW...], or ? for more options: patch 
54f081bc4708d1ae886d7853b1c2e798aca54031
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jul 30 19:48:43 CEST 2020
  * large
Shall I obliterate this patch? (2/2)  [ynW...], or ? for more options: Finished obliterating.
0,00s 4M
37,33s 1384M

Note how much worse obliterate is versus record.
History
Date User Action Args
2006-01-04 17:14:56zookocreate
2006-01-04 17:18:20zookosetstatus: unread -> unknown
nosy: droundy, tommy, zooko
messages: + msg301
2006-01-13 14:43:07droundysetnosy: droundy, tommy, zooko
2006-01-13 14:43:15droundysetpriority: feature -> bug
nosy: droundy, tommy, zooko
2007-07-16 09:27:11koweysettopic: + Performance
nosy: + kowey, beschmi
2008-01-31 04:03:28markstossettopic: + Confirmed, Darcs2, IncludesExampleOrTest
nosy: + markstos
messages: + msg2951
title: memory usage is 3X patch size, and darcs record took at least 35 minutes -> record: memory usage is 2X patch size
2008-01-31 16:27:00droundysetnosy: droundy, tommy, beschmi, kowey, markstos, zooko
messages: + msg2981
2008-02-16 18:59:22markstoslinkissue172 superseder
2008-02-16 19:03:10markstossetstatus: unknown -> deferred
nosy: droundy, tommy, beschmi, kowey, markstos, zooko
superseder: + Release Darcs 2.0
messages: + msg3503
2008-05-14 13:03:16koweysetstatus: deferred -> unknown
nosy: + Serware, dagit
superseder: - Release Darcs 2.0
messages: + msg4675
2008-09-06 12:21:07gwernsetnosy: + gwern
2008-10-07 14:54:23droundysetpriority: bug -> feature
nosy: + dmitry.kurochkin, simon, thorkilnaur
2008-10-07 14:54:37droundysetnosy: droundy, tommy, beschmi, kowey, markstos, zooko, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware
2009-08-06 17:39:41adminsetnosy: + jast, darcs-devel, mornfall, - droundy, gwern
2009-08-06 20:36:38adminsetnosy: - beschmi
2009-08-10 22:19:54adminsetnosy: + gwern, - darcs-devel, jast, mornfall
2009-08-11 00:10:48adminsetnosy: - dagit
2009-08-17 05:03:03koweysettopic: - Darcs2
nosy: tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware
2009-08-17 17:10:16koweysetnosy: tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware
superseder: + wish: "chunky" representation for hunks
2009-08-23 18:02:45koweysettopic: - Confirmed, IncludesExampleOrTest
nosy: tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware
messages: + msg8416
2009-08-23 18:03:19koweysetstatus: unknown -> needs-reproduction
nosy: tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware
2009-08-25 17:30:33adminsetnosy: + darcs-devel, - simon
2009-08-26 18:00:07koweysetnosy: tommy, kowey, markstos, darcs-devel, zooko, thorkilnaur, gwern, dmitry.kurochkin, Serware
2009-08-27 14:30:53adminsetnosy: tommy, kowey, markstos, darcs-devel, zooko, thorkilnaur, gwern, dmitry.kurochkin, Serware
2009-10-23 22:42:51adminsetnosy: + serware, - Serware
2009-10-23 23:28:47adminsetnosy: + Serware, - serware
2012-07-16 21:42:57lukeworthsetnosy: + lukeworth
2020-07-30 17:54:05bfrksetpriority: feature -> urgent
messages: + msg22309
2020-07-30 17:54:47bfrksetstatus: needs-reproduction -> needs-testcase
2020-07-30 17:55:28bfrksettitle: record: memory usage is 2X patch size -> memory usage is up to 20X patch size