|
Created on 2010-02-18.19:01:37 by quick, last changed 2020-07-31.21:46:56 by bfrk.
msg10019 (view) |
Author: quick |
Date: 2010-02-18.19:01:34 |
|
I caught newer darcs wasting time reading files it should be ignoring.
In worst-case scenarios this causes it to fail by running out of
memory. I discovered this because I happened to copy a couple of very
large datafiles into my working tree.
$ cd [some-large-darcs-repo]
$ darcs-2.0.2 -v
2.0.2 (release)
$ darcs.net -v
2.3.1 (+ 468 patches)
$ darcs show repo
Type: darcs
Format: hashed
Pristine: HashedPristine
Num Patches: 3491
$ darcs show files | wc -l
5053
$
The darcs.net is up-to-date as of today, 18 Feb 2010.
$ time darcs-2.0.2 w -l
M filea
M fileb
real 1m10.187s
user 0m31.342s
sys 0m6.154s
$ time darcs.net w -l
M filea
M fileb
real 0m53.732s
user 0m33.392s
sys 0m1.899s
And now to create the problem:
$ dd if=/dev/zero of=datafile1 bs=1024 count=1750000
[creates a 1.7GB file]
$ dd if=/dev/zero of=datafile2 bs=1024 count=1750000
$ time darcs-2.0.2 w -l
M filea
M fileb
a ./datafile1
a ./datafile2
real 1m4.777s
user 0m31.536s
sys 0m5.669s
Old darcs noticed the new files but other than that, it clearly didn't
waste any time on them. Not so with new darcs:
$ time darcs.net w -l
darcs: out of memory (requested 1808793600 bytes)
real 1m42.485s
user 0m8.956s
sys 0m28.924s
Trail of darcs.net --exact-version:
Compiled with:
HTTP-4000.0.9
array-0.2.0.0
base-4.1.0.0
bytestring-0.9.1.4
containers-0.2.0.1
directory-1.0.0.3
extensible-exceptions-0.1.1.0
filepath-1.1.0.2
hashed-storage-0.4.7
haskeline-0.6.2.2
html-1.0.1.2
mmap-0.4.1
mtl-1.1.0.2
network-2.2.1.7
old-time-1.0.0.2
parsec-3.0.1
process-1.0.1.1
random-1.0.0.1
regex-compat-0.92
terminfo-0.3.1.1
text-0.7.1.0
unix-2.3.2.0
zlib-0.5.2.0
|
msg10020 (view) |
Author: quick |
Date: 2010-02-18.19:11:22 |
|
Even though the original example used extremely large files to
demonstrate the problem, I wanted to point out that there is a negative
aspect to this even if you don't have this extreme case.
On my system, the size of the darcs executable itself is 12MB. This
means that if I'm working on darcs itself, every "$ darcs w -l" spends
time reading this 12MB file (and object files, etc.).
On the heels of this is the thought that we should be extending
Haskell's intrinsic laziness to this area as well for darcs summary-
mode whatsnew operations:
* darcs should note the existence of new files, but do nothing more
with that file
* for an existing file, finding the *first* difference in the file
should be sufficient to report it as modified: no need to scan the rest
of the file.
* for a removed directory, everything beneath that directory must
have been removed as well: no need to actually check that.
Just some general thoughts; whatsnew is probably one of the most
frequently issued commands so performance is key for this one.
|
msg10026 (view) |
Author: kowey |
Date: 2010-02-19.10:49:09 |
|
Comments, Petr? I know we all want to see 2.4 go out the door, but is
there any chance this is a blocker?
[Sorry Kevin if the answer to that is self-evident; I'm in skim and
delegate mode, here]
|
msg10034 (view) |
Author: mornfall |
Date: 2010-02-19.15:39:50 |
|
Well, the summary mode(s) are implemented by taking the full patch and
summarising it. There was probably some non-obvious laziness hack
involved in older versions that avoided constructing the full patch.
Currently, the code is probably more strict and therefore computes more.
Of course this is something that could be improved, but is not top
priority. Keeping large non-boring files around in a repository is very
rare, and boring files are not read so this is a non-issue.
I think that a proper fix here would be to implement a simple summary
mode that wouldn't rely on generating a patch sequence instead of the
current one. Let's aim for 2.5.
|
msg11112 (view) |
Author: hoijarvi |
Date: 2010-05-25.15:08:13 |
|
I just run into this and created 1851 which I now marked as a dupe.
This can silently kill TortoiseDarcs, workaround is to mark those files
as boring.
|
msg11297 (view) |
Author: kowey |
Date: 2010-06-07.08:21:07 |
|
Marking need implementation since it looks like we have an idea how to
tackle this if I'm reading Petr correctly when he says
> I think that a proper fix here would be to implement a simple summary
> mode that wouldn't rely on generating a patch sequence instead of the
> current one. Let's aim for 2.5.
|
msg11406 (view) |
Author: tux_rocker |
Date: 2010-06-14.06:48:37 |
|
I may make an effort to have this fixed before 2.5. While it may be a
darcser's second nature to avoid large files in a repo, it's not for
others. It makes darcs look stupid if the mere presence of a few
1.7-gigabyte files makes it crash.
|
msg11608 (view) |
Author: tux_rocker |
Date: 2010-06-27.18:33:27 |
|
I was sounding very brave before but I'm afraid I'm not able to live up
to it. If anyone else would be so kind to take a stab at it, we'd be
very grateful.
|
msg14757 (view) |
Author: markstos |
Date: 2011-10-13.13:03:24 |
|
It's not a regression since 2.5, so bumping to 2.10.
|
|
Date |
User |
Action |
Args |
2010-02-18 19:01:37 | quick | create | |
2010-02-18 19:11:25 | quick | set | messages:
+ msg10020 |
2010-02-19 10:49:15 | kowey | set | topic:
+ Performance, Regression nosy:
+ tux_rocker, kowey, mornfall messages:
+ msg10026 title: reading files it should not -> whatsnew -l reading files it should not (2.4) |
2010-02-19 10:49:25 | kowey | set | status: unknown -> needs-reproduction |
2010-02-19 10:49:48 | kowey | set | topic:
+ Target-2.4 |
2010-02-19 15:40:01 | mornfall | set | topic:
+ Target-2.5, - Target-2.4 messages:
+ msg10034 |
2010-03-22 12:25:48 | kowey | link | issue1762 superseder |
2010-05-25 15:05:17 | hoijarvi | link | issue1851 superseder |
2010-05-25 15:08:14 | hoijarvi | set | nosy:
+ hoijarvi messages:
+ msg11112 |
2010-06-07 08:21:08 | kowey | set | status: needs-reproduction -> needs-implementation nosy:
- darcs-devel messages:
+ msg11297 |
2010-06-14 06:48:37 | tux_rocker | set | assignedto: tux_rocker messages:
+ msg11406 |
2010-06-15 20:52:06 | admin | set | milestone: 2.5.0 |
2010-06-15 20:59:39 | admin | set | topic:
- Target-2.5 |
2010-06-27 18:33:28 | tux_rocker | set | assignedto: tux_rocker -> messages:
+ msg11608 |
2010-07-25 14:29:33 | tux_rocker | set | milestone: 2.5.0 -> 2.8.0 |
2011-10-13 13:03:25 | markstos | set | messages:
+ msg14757 milestone: 2.8.0 -> 2.10.0 |
2015-04-18 17:39:41 | gh | set | milestone: 2.10.0 -> 2.12.0 |
2020-07-31 21:46:56 | bfrk | set | milestone: 2.12.0 -> |
|