Created on 2006-01-04.17:14:56 by zooko, last changed 2020-07-30.17:55:28 by bfrk.
msg300 (view) |
Author: zooko |
Date: 2006-01-04.17:14:55 |
|
I created a 0.5 GiB file and darcs added it, darcs consumes 1.5 GiB of RAM
during the darcs add and again during the darcs record. It also took 35
minutes of maximum CPU on my high-powered workstation before I killed it.
It would be nice if darcs required enough RAM to store "only" one copy of the
patch. It would be nicer if darcs required less RAM -- using instead a fixed
maximum buffer of RAM and lazily processing the file as needed.
Hopefully the fact that laziness is one of the oldest, core features of the
design of Haskell means that it is relatively easy for programmers to implement
algorithms that do not eagerly consume RAM ?
The extreme CPU usage is perplexing. Are we trying to match the entire
contents of the binary file against a regex or something?
Regards,
Zooko
DARC yumyum:/mnt/sdb1/zooko/tmp$ time head --bytes=`python -c 'print 2**29'` /dev/zero > 0.5_GiB_file_a
real 0m1.790s
user 0m0.116s
sys 0m0.820s
DARC yumyum:/mnt/sdb1/zooko/tmp$
DARC yumyum:/mnt/sdb1/zooko/tmp$ l
drwxr-xr-x 5 zooko zooko 200 Jan 4 12:29 ./..
drwxrwxr-x 6 zooko zooko 184 Jan 4 12:36 ./_darcs
drwxrwxr-x 3 zooko zooko 104 Jan 4 12:36 ./.
-rw-rw-r-- 1 zooko zooko 536870912 Jan 4 12:36 ./0.5_GiB_file_a
DARC yumyum:/mnt/sdb1/zooko/tmp$ darcs add 0.5_GiB_file_a
DARC yumyum:/mnt/sdb1/zooko/tmp$ time darcs record
addfile ./0.5_GiB_file_a
Shall I record this patch? (1/?) [ynWsfqadjkc], or ? for help: y
binary ./0.5_GiB_file_a
Shall I record this patch? (2/?) [ynWsfqadjkc], or ? for help: y
What is the patch name? a
Do you want to add a long comment? [yn] n
Couldn't handle interrupt since darcs was in a sensitive job.
Couldn't handle interrupt since darcs was in a sensitive job.
Finished recording patch 'a'
real 35m49.529s
user 34m33.986s
sys 0m19.432s
DARC yumyum:/mnt/sdb1/zooko/tmp$
|
msg301 (view) |
Author: zooko |
Date: 2006-01-04.17:18:19 |
|
Hm. I just noticed the "Finished record patch" part. That seems like a
separate bug. I hit C-c after 35 minutes.
--Z
> DARC yumyum:/mnt/sdb1/zooko/tmp$ darcs add 0.5_GiB_file_a
> DARC yumyum:/mnt/sdb1/zooko/tmp$ time darcs record
> addfile ./0.5_GiB_file_a
> Shall I record this patch? (1/?) [ynWsfqadjkc], or ? for help: y
> binary ./0.5_GiB_file_a
> Shall I record this patch? (2/?) [ynWsfqadjkc], or ? for help: y
> What is the patch name? a
> Do you want to add a long comment? [yn] n
>
> Couldn't handle interrupt since darcs was in a sensitive job.
> Couldn't handle interrupt since darcs was in a sensitive job.
> Finished recording patch 'a'
>
> real 35m49.529s
> user 34m33.986s
> sys 0m19.432s
> DARC yumyum:/mnt/sdb1/zooko/tmp$
|
msg2951 (view) |
Author: markstos |
Date: 2008-01-31.04:03:24 |
|
I confirmed this issue with Darcs2 and the --darcs-2 format tonight, although
the memory usage reported was "only" twice the patch size. After 'record' ran
about about 2 minutes on a 1 Ghz laptop, darcs bailed out with this error:
darcs: out of memory (requested 1074790400 bytes)
(That's about 1 Gig of Ram being requested.)
If the reason the files are being loaded into memory is to check for changes, it
seems like some special case improvements are possible:
- If it's an "add", of course the whole file is new. Maybe we can avoid loading
the as much in this case?
- If it's a binary file, there's no need to look inside it, just to notice that
it changed, right?
|
msg2981 (view) |
Author: droundy |
Date: 2008-01-31.16:26:55 |
|
We have previously had special-case code to enable record to run with less
memory, but I'm not convinced this is a good idea. By design, some of darcs
operations require that we hold parsed patches in memory, which will always
require more memory than the actual patch. And I don't care for the idea of
allowing users to record a patch that they cannot unrecord.
The real solution here is to revamp our handling of hunk patches so that we
don't store them in memory as a list of lines, but instead as a solid chunk of
memory with a stored number of lines.
David
|
msg3503 (view) |
Author: markstos |
Date: 2008-02-16.19:03:06 |
|
David has been working on the fix to to replace the in-memory storage of hunks
to be blob-based rather than line-based.
However, the work caused some regressions, so it is being paused now while we
work on a stable Darcs 2 release. I'm marking the Darcs 2 release as a superseder.
|
msg4675 (view) |
Author: kowey |
Date: 2008-05-14.13:03:10 |
|
Ok, darcs-2 is released, so I'm reviving this performance bug.
|
msg8416 (view) |
Author: kowey |
Date: 2009-08-23.18:02:42 |
|
Perhaps profiling would also be useful here?
|
msg22309 (view) |
Author: bfrk |
Date: 2020-07-30.17:54:00 |
|
Our oldest open issue and it is still a problem. I found it can actually be as much as 20
times with slightly smaller files.
>head --bytes=`python -c 'print 2**25'` /dev/zero > large
>time darcs add large
Adding './large'
Finished adding:
./large
0,02s 20M
>time darcs record -am 32M --skip-long-comment
Finished recording patch '32M'
3,57s 648M
>head --bytes=`python -c 'print 2**26'` /dev/zero > large
>time darcs record -am 64M --skip-long-comment
Finished recording patch '64M'
10,23s 1560M
>head --bytes=`python -c 'print 2**27'` /dev/zero > large
>time darcs record -am 128M --skip-long-comment
Finished recording patch '128M'
20,25s 3093M
>time echo yd|darcs obliterate
patch 927735e394244c8f0f042d51ceeb7102cca13206
Author: Ben Franksen <ben.franksen@online.de>
Date: Thu Jul 30 19:50:24 CEST 2020
* 128M
Shall I obliterate this patch? (1/3) [ynW...], or ? for more options: patch
a7b056521a8e2041c61494fe5f15c18f97e48f78
Author: Ben Franksen <ben.franksen@online.de>
Date: Thu Jul 30 19:49:41 CEST 2020
* 64M
Shall I obliterate this patch? (2/3) [ynW...], or ? for more options: Finished obliterating.
81,83s 3185M
>time echo yd|darcs obliterate
patch a7b056521a8e2041c61494fe5f15c18f97e48f78
Author: Ben Franksen <ben.franksen@online.de>
Date: Thu Jul 30 19:49:41 CEST 2020
* 64M
Shall I obliterate this patch? (1/2) [ynW...], or ? for more options: patch
54f081bc4708d1ae886d7853b1c2e798aca54031
Author: Ben Franksen <ben.franksen@online.de>
Date: Thu Jul 30 19:48:43 CEST 2020
* large
Shall I obliterate this patch? (2/2) [ynW...], or ? for more options: Finished obliterating.
0,00s 4M
37,33s 1384M
Note how much worse obliterate is versus record.
|
|
Date |
User |
Action |
Args |
2006-01-04 17:14:56 | zooko | create | |
2006-01-04 17:18:20 | zooko | set | status: unread -> unknown nosy:
droundy, tommy, zooko messages:
+ msg301 |
2006-01-13 14:43:07 | droundy | set | nosy:
droundy, tommy, zooko |
2006-01-13 14:43:15 | droundy | set | priority: feature -> bug nosy:
droundy, tommy, zooko |
2007-07-16 09:27:11 | kowey | set | topic:
+ Performance nosy:
+ kowey, beschmi |
2008-01-31 04:03:28 | markstos | set | topic:
+ Confirmed, Darcs2, IncludesExampleOrTest nosy:
+ markstos messages:
+ msg2951 title: memory usage is 3X patch size, and darcs record took at least 35 minutes -> record: memory usage is 2X patch size |
2008-01-31 16:27:00 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, markstos, zooko messages:
+ msg2981 |
2008-02-16 18:59:22 | markstos | link | issue172 superseder |
2008-02-16 19:03:10 | markstos | set | status: unknown -> deferred nosy:
droundy, tommy, beschmi, kowey, markstos, zooko superseder:
+ Release Darcs 2.0 messages:
+ msg3503 |
2008-05-14 13:03:16 | kowey | set | status: deferred -> unknown nosy:
+ Serware, dagit superseder:
- Release Darcs 2.0 messages:
+ msg4675 |
2008-09-06 12:21:07 | gwern | set | nosy:
+ gwern |
2008-10-07 14:54:23 | droundy | set | priority: bug -> feature nosy:
+ dmitry.kurochkin, simon, thorkilnaur |
2008-10-07 14:54:37 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, markstos, zooko, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware |
2009-08-06 17:39:41 | admin | set | nosy:
+ jast, darcs-devel, mornfall, - droundy, gwern |
2009-08-06 20:36:38 | admin | set | nosy:
- beschmi |
2009-08-10 22:19:54 | admin | set | nosy:
+ gwern, - darcs-devel, jast, mornfall |
2009-08-11 00:10:48 | admin | set | nosy:
- dagit |
2009-08-17 05:03:03 | kowey | set | topic:
- Darcs2 nosy:
tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware |
2009-08-17 17:10:16 | kowey | set | nosy:
tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware superseder:
+ wish: "chunky" representation for hunks |
2009-08-23 18:02:45 | kowey | set | topic:
- Confirmed, IncludesExampleOrTest nosy:
tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware messages:
+ msg8416 |
2009-08-23 18:03:19 | kowey | set | status: unknown -> needs-reproduction nosy:
tommy, kowey, markstos, zooko, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware |
2009-08-25 17:30:33 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-26 18:00:07 | kowey | set | nosy:
tommy, kowey, markstos, darcs-devel, zooko, thorkilnaur, gwern, dmitry.kurochkin, Serware |
2009-08-27 14:30:53 | admin | set | nosy:
tommy, kowey, markstos, darcs-devel, zooko, thorkilnaur, gwern, dmitry.kurochkin, Serware |
2009-10-23 22:42:51 | admin | set | nosy:
+ serware, - Serware |
2009-10-23 23:28:47 | admin | set | nosy:
+ Serware, - serware |
2012-07-16 21:42:57 | lukeworth | set | nosy:
+ lukeworth |
2020-07-30 17:54:05 | bfrk | set | priority: feature -> urgent messages:
+ msg22309 |
2020-07-30 17:54:47 | bfrk | set | status: needs-reproduction -> needs-testcase |
2020-07-30 17:55:28 | bfrk | set | title: record: memory usage is 2X patch size -> memory usage is up to 20X patch size |
|