darcs

Issue 79 whatsnew -ls loads complete contents of files into memory

Title whatsnew -ls loads complete contents of files into memory
Priority wishlist Status resolved
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, markstos, thorkilnaur, tommy, zooko
Assigned To
Topics Darcs2, Performance

Created on 2006-01-04.16:28:57 by zooko, last changed 2009-08-27.13:57:12 by admin.

Messages
msg299 (view) Author: zooko Date: 2006-01-04.16:28:56
To reproduce:

$ time head --bytes=17179869184 /dev/zero > bigtempfile
$ darcs init
$ darcs whatsnew -l

To fix:

Process files in a lazy manner.  Perhaps you could even use a programming
language which has laziness built in!  :-)

(Thanks for Brian Warner for bringing this bug to my attention.)

Regards,

Zooko
msg304 (view) Author: dagit Date: 2006-01-05.03:05:32
Zooko,

It's wonderful you've created such a simple test case.

If we assume that pristine is available, there are no pending patches  
and we know which files are modified, could darcs compare all the  
filenames in pristine and the working copy?  If yes, why does darcs  
do anything different?  Does darcs need to actually parse old patches  
to solve this problem with full generality?  If pristine is missing  
would darcs have to reconstruct it before whatsnew works?

How efficient is the 'darcs whats' without the check for new files?   
I added a file which took about a minute to add and then typed 'darcs  
whats' and it takes less than a second.  If I modify the file I can  
increase the time to about 3 seconds for the whatsnew.  This makes me  
think that the case for --look-for-adds should just do a directory  
listing comparison after we know which files were modified (which  
seems to be quick).  Possibly it could even build up the directory  
listing as it goes, but I'm not familiar with that code at all.

Jason

On Jan 4, 2006, at 8:28 AM, Zooko wrote:

>
> New submission from Zooko <zooko@zooko.com>:
>
> To reproduce:
>
> $ time head --bytes=17179869184 /dev/zero > bigtempfile
> $ darcs init
> $ darcs whatsnew -l
>
> To fix:
>
> Process files in a lazy manner.  Perhaps you could even use a  
> programming
> language which has laziness built in!  :-)
>
> (Thanks for Brian Warner for bringing this bug to my attention.)
>
> Regards,
>
> Zooko
>
> ----------
> messages: 299
> nosy: droundy, tommy, zooko
> status: unread
> title: whatsnew -l loads complete contents of files into memory
>
> ____________________________________
> Darcs issue tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue79>
> ____________________________________
>
> _______________________________________________
> darcs-devel mailing list
> darcs-devel@darcs.net
> http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
msg309 (view) Author: dagit Date: 2006-01-05.18:13:01
There are at least two places where we need to consider --look-for-adds.  
During darcs record and during darcs whatsnew.  

In the case of whatsnew --look-for-adds has an implicit --summary.  So by 
default during whatsnew any diff that is performed should stop as soon as it 
realizes a file needs to be added.  Currently darcs grabs the file contents so 
it can create the patch (in all cases).  Since Haskell is lazy I think it was 
assumed that it wouldn't get the file contents until it needs to display the 
patch.  For whatever reason the file contents are demanded even in summary 
mode.  My workaround simply checks for either summary or lack of no-summary and 
then skips returning a patch of the file contents in those cases.

In the case of record, there is an implicit no-summary.  

My proposed change is to do a 'short circuit' sort of thing.  If summary is 
present or no-summary is missing then darcs will skip the patch with the 
contents of the file.  I'm not sure if this is favorable to figuring out where 
the strictness is and fixing it.  That solution would be elegant, but I think 
this problem could get reintroduced later by a seemingly unrelated change that 
adds the strictness back.  So in this case I would argue that the simple short 
circuit check is better.  Plus, I haven't been able to determine where or why 
the patch is being evalauted (causing the file to be read).

Comments?  If this approach sounds good I can probably have a proper patch 
ready sometime early next week (busy all weekend).
msg660 (view) Author: zooko Date: 2006-05-18.15:19:38
According to a comment in issue #70:

http://bugs.darcs.net/issue170

This bug can be fixed with this patch:

http://article.gmane.org/gmane.comp.version-control.darcs.devel/3866/match=whatsnew
msg662 (view) Author: zooko Date: 2006-05-18.15:40:10
My brother is trying darcs on my recommendation, and because of the abysmal
performance of "darcs whatsnew -s -l", he is complaining, and his boss/friend is
suggesting that he switch to git which is ever so much faster.  So this
motivates me to check up on this issue.


I explored the darcs-devel mailing list e.g.

http://article.gmane.org/gmane.comp.version-control.darcs.devel/3902

and I explored my copy of darcs, e.g.

darcs changes | grep -iEe"fixes.*for.*(test.*suite|issue.*79)"

and as far as I can tell this patch was just accidentally dropped.

This demonstrates how we need better patch management in darcs development.  If
it turns out that the patch was accidentally dropped, that is a problem that we
should institute some tools to fix.  If it turns out that the patch *was*
included, but in such a way that an archaeologist such as myself can't tell that
it was included, then that too is a problem that we should fix.

Regards,

Zooko
msg668 (view) Author: zooko Date: 2006-05-18.18:45:33
> This demonstrates how we need better patch management in darcs development.
>   If it turns out that the patch was accidentally dropped, that is a 
> problem that we should institute some tools to fix.

One possible future tool is trac+darcs.  Currently it has some bugs [1] and
limitations, but it might be better soon.

Regards,

Zooko

[1] http://progetti.arstecnica.it/trac+darcs/report/1
msg2888 (view) Author: markstos Date: 2008-01-30.02:49:36
I ran a variation of Zooko's test case By creating a 1.5 Gig file, which is more
memory than I have (both physically and virtually), and then trying "darcs
whatsnew -l file" with it. 

Using both Darcs 1.0.9 and Darcs 2, the command is nearly instant, and can't
possibly be loading the file into memory. 

However, then if I "add" the file, that is also instant, but "whatsnew -ls" then
fails:

 darcs w -ls
 darcs: out of memory (requested 1591738368 bytes)

So to refine Zooko's case some: Darcs appears to attempting to read complete
copies of /added/ files when "darcs whatsnew -l" is run.
msg4076 (view) Author: droundy Date: 2008-03-28.19:21:44
I think rather than renaming this bug to refer to a different feature (darcs
whatsnew -s loads new files into memory), we should perhaps close it.  The
original bug was fixed by Jason's patch, which is in the repository (and has
been for I don't know how long).

David
History
Date User Action Args
2006-01-04 16:28:57zookocreate
2006-01-05 03:05:33dagitsetstatus: unread -> unknown
nosy: + dagit
messages: + msg304
2006-01-05 18:13:02dagitsetnosy: droundy, tommy, zooko, dagit
messages: + msg309
2006-05-18 15:19:41zookosetnosy: droundy, tommy, zooko, dagit
messages: + msg660
2006-05-18 15:40:13zookosetnosy: droundy, tommy, zooko, dagit
messages: + msg662
2006-05-18 18:45:36zookosetnosy: droundy, tommy, zooko, dagit
messages: + msg668
2006-07-04 12:04:17droundysetnosy: droundy, tommy, zooko, dagit
title: whatsnew -l loads complete contents of files into memory -> whatsnew -ls loads complete contents of files into memory
2007-07-20 14:01:41koweysettopic: + Performance
nosy: + kowey, beschmi
2007-08-29 20:02:55koweylinkissue201 superseder
2008-01-30 02:49:38markstossettopic: + Darcs2
nosy: + markstos
messages: + msg2888
title: whatsnew -ls loads complete contents of files into memory -> whatsnew -ls loads complete contents of added files into memory
2008-01-31 03:49:23markstoslinkissue49 superseder
2008-01-31 16:07:03zookolinkissue170 superseder
2008-02-11 01:13:14markstossetstatus: unknown -> deferred
nosy: droundy, tommy, beschmi, kowey, markstos, zooko, dagit
2008-03-28 19:21:46droundysetstatus: deferred -> resolved
nosy: droundy, tommy, beschmi, kowey, markstos, zooko, dagit
messages: + msg4076
title: whatsnew -ls loads complete contents of added files into memory -> whatsnew -ls loads complete contents of files into memory
2009-08-06 17:48:03adminsetnosy: + jast, Serware, dmitry.kurochkin, darcs-devel, mornfall, simon, thorkilnaur, - droundy
2009-08-06 20:43:48adminsetnosy: - beschmi
2009-08-10 22:19:40adminsetnosy: - darcs-devel, jast, Serware, mornfall
2009-08-11 00:10:26adminsetnosy: - dagit
2009-08-25 17:59:48adminsetnosy: + darcs-devel, - simon
2009-08-27 13:57:12adminsetnosy: tommy, kowey, markstos, darcs-devel, zooko, thorkilnaur, dmitry.kurochkin