darcs

Issue 498 progress indicators

Title progress indicators
Priority wishlist Status resolved
Milestone Resolved in
Superseder option to show commute progress (e.g. pull --very-verbose)
View: 72
Nosy List darcs-devel, dmitry.kurochkin, kowey, markstos, quick, thorkilnaur, tommy, zooko
Assigned To
Topics

Created on 2007-07-20.20:08:35 by zooko, last changed 2009-08-27.14:07:07 by admin.

Messages
msg1881 (view) Author: zooko Date: 2007-07-20.20:08:34
As per this e-mail message:

http://lists.osuosl.org/pipermail/darcs-devel/2007-July/005927.html

If darcs is in the process of doing something (in this case fetching  
metadata from a remote repository), it doesn't do a good enough job  
of informing the user about its progress.  (It should do this when  
the user has passed "-v -v -v" to indicate that she wants verbose  
output.)

Progress indicators might seem like a mere "decoration", but they are  
actually very important to the user experience, so this is not an  
unimportant feature request.

Regards,

Zooko
msg1882 (view) Author: kowey Date: 2007-07-20.20:48:15
Thanks Zooko,

> Progress indicators might seem like a mere "decoration", but they are  
> actually very important to the user experience, so this is not an  
> unimportant feature request.

I think we all agree that this is very important stuff.  You might want to have
a look at the related issue72.

Also, just to check, is it really both darcs get and darcs pull where Rob is
experiencing the problems?

The reason I ask is that if it was just darcs pull, we have an idea why it's
so slow (lots of commutation).  And for this, we really ought to have some
extra progress indicators.

But if darcs get was also slow and silent, then I am puzzled, because then
the issue is likely something more mundane like fetching files over HTTP.
And darcs get has plenty of verbosity there, so it wouldn't be both slow
and silent.

So just to confirm, is it really just darcs pull that has this silent
behaviour?
msg1889 (view) Author: quick Date: 2007-07-21.23:06:21
On 20 Jul 2007, at 1:47 PM, Eric Y. Kow wrote:

> Thanks Zooko,
>
>> Progress indicators might seem like a mere "decoration", but they are
>> actually very important to the user experience, so this is not an
>> unimportant feature request.
>
> I think we all agree that this is very important stuff.  You might  
> want to have
> a look at the related issue72.
>
> Also, just to check, is it really both darcs get and darcs pull  
> where Rob is
> experiencing the problems?
>
> The reason I ask is that if it was just darcs pull, we have an idea  
> why it's
> so slow (lots of commutation).  And for this, we really ought to  
> have some
> extra progress indicators.

I did some experiments on a local repository that I access remotely  
via SSH.  It turns out that a lot of the time is consumed fetching  
patches via SSH and there's no -v progress indicators for when this  
is happening.  The commutation was negligible in my case.

I then got the (at the time seemingly) bright idea that I could use  
David's new caching (_darcs/prefs/sources) to cache SSH-fetched  
patches.  Thus the patch fetching time would only be paid for the  
first retrieval.  After a little Haskelling, I had this up and  
running and it did indeed help significantly: my original "darcs pull  
--dry-run" (with ssh control-mastering) was 1:47 (one minute, 47  
seconds)... the new version with caching dropped that time to under 6  
seconds for subsequent pulls.

Alas, I realized I had leapt before I'd looked.  A patch (for a  
DarcsRepo at least) is only valid in the context of it's repository,  
but the caching doesn't store the repository information.

Details:
   * Imagine a repository (R1) containing file foo.txt.
   * Create two new repositories from R1 (called R2 and R3, oddly  
enough).
   * In R1, add some lines to the beginning of foo.txt and record  
that change.
   * Next in R1, add some lines to the middle of foo.txt and record  
that change.
   * In R2, pull the second patch only to R1.  This will cause the  
patch to be
      rewritten to change the hunk offsets to account for the lack of  
the first patch.
      However, the patch name and internal PatchInfo are the same for  
that patch.
   * In R3, do a dry-run pull from R1, caching R1's patches locally.   
Then do a pull -a
     from R2: problem occurs because darcs will used the locally  
cached patch,
     which is not the patch from R2 (wrong hunk offsets).

At this point, I have an alternative suggestion for people facing  
these types of performance problems, and some questions for David or  
more knowledgeable folks:

Suggestion: create a local "mirror" of the remote repository.   
Periodically just do "push -a" and "pull -a" to keep the mirror in  
sync with the remote repository.  Then do all the local working  
repository pushes and pulls against the local mirror; it's local so  
there's no ssh (or http) access overhead, and it should be much faster.

Questions:
   * I looked back over the reflector mail, but I'm afraid I'm still  
a little foggy over
     the new HashedRepo format and how it's different from the old- 
style DarcsRepo.
     And particularly, how is it valid to cache patches for a HashedRepo
     in a manner that avoids the above problem for DarcsRepo-style  
patches?

     [If David, Eric, or one of the other darcs illuminati would send  
me notes on HashedRepo
     details, I'd be happy to assist in converting that to a  
lengthier exposition for inclusion
     in darcs docs.  If this is already there and I've just  
overlooked it, please just clue me
     in.  :-)  ]

   * Does the caching sound interesting enough to warrant resolving  
the above issue
      in some manner, or is the suggested workaround above the  
simplest and recommended
      way to handle this?

If the caching is interesting, here's some possible thoughts for  
making it work:
   * In the cache: writable directories, use a subdirectory that is  
named or has a
      special file that specifies the remote repository that the  
entries in that cache
      came from.
   * To fully support the above, the repositories probably need to  
have an internal
      file that indicates "last modified date" for their repository.   
Anything which
      might modify a patch file after it was stored in the repository  
(unpull/unrecord/
      optimize+reorder/amend) would change "modified date".  Any  
cache-based
      access from a remote repo would now pull this "modified date"  
information
      along with the inventory and compare it to the contents of the  
cache, wiping the
      cache if the source was modified more recently.  Note that  
record and pull
      operations would *not* change the modified date, since they  
don't invalidate
      previous patches that might have been cached.
   * Alternatively, add a 5th _darcs/prefs/sources type which is a  
"mirror-repo:".
      Currently, a repo: entry in sources is read-only.  A mirror- 
repo: would be
      writeable.  This would essentially be an automation of the  
above workaround
      suggestion, but it would allow the local mirror to be  
automatically updated
      when working in other local repos.  Unfortunately, there's some  
issues to
      consider for this method with respect to disallowing other  
types of updates
      to the mirror repo and things like that.

Back to Zooko's original issue, I think feedback is definitely  
desireable.  Although the user should be allowed to explicitly  
specify the verbosity of a particular operation, I'd like to suggest  
(1) having a timing threshold (obtained from a prefs file or  
environment variable) that defaults to something like 5 seconds.  Any  
operation whose elapsed time exceeded this threshold would increase  
its verbosity, and (2) making any verbosity controls default their  
inputs from environment variables or user prefs files.   Thus the  
normal stuff is fast and quiet, but at about the time the user starts  
wondering what's going on, darcs starts telling them.  And making it  
an environment variable/prefs file means that it would always be in  
place; requiring it on the command line means that a user starts a  
command, then after the pain threshold is crossed, they have to Ctrl- 
C abort it and start from the beginning with more verbosity flags.

OK, enough rambling.  Feedback/info on the above would be appreciated.

-KQ

>
> But if darcs get was also slow and silent, then I am puzzled,  
> because then
> the issue is likely something more mundane like fetching files over  
> HTTP.
> And darcs get has plenty of verbosity there, so it wouldn't be both  
> slow
> and silent.
>
> So just to confirm, is it really just darcs pull that has this silent
> behaviour?
>
> -- 
> Eric Kow                     http://www.loria.fr/~kow
> PGP Key ID: 08AC04F9         Merci de corriger mon français.
> _______________________________________________
> darcs-devel mailing list
> darcs-devel@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-devel
msg1890 (view) Author: kowey Date: 2007-07-22.04:20:29
On Sat, Jul 21, 2007 at 15:56:11 -0700, Kevin Quick wrote:
> I did some experiments on a local repository that I access remotely  
> via SSH.  It turns out that a lot of the time is consumed fetching  
> patches via SSH and there's no -v progress indicators for when this  
> is happening.  The commutation was negligible in my case.

Ah, the dangers of making assumptions.  Thanks for pointing out this
possibility, Kevin!

In fact, this sort of lines up nicely with Zooko saying that it was
about synching with remote repos.  Zooko, have you tried the chatty
ssh trick to log your ssh calls?
  http://wiki.darcs.net/index.html/DeveloperTips

You might be able to watch (i.e. the 'watch' cmd) your ssh call log
and see what is happening.

One thing I wonder is if it is literally fetching the patches over SSH
or something even simpler.  For example, when making SSH connections at
work, I always get a little delay unless I force it to use IPv4
addresses only.  Never really figured out why; all I remember was that
without this setting, ssh would hang for like 30 seconds (timeout?)
before succesfully connecting.  Zooko, are you sure it's not something
ridiculously simple like that?

Haven't read the rest of Kevin's mail yet ( sorry :-) )
-- 
Eric Kow                     http://www.loria.fr/~kow
PGP Key ID: 08AC04F9         Merci de corriger mon français.
msg2887 (view) Author: markstos Date: 2008-01-30.02:33:23
There is now a progress reporting framework in the unstable branch.
History
Date User Action Args
2007-07-20 20:08:36zookocreate
2007-07-20 20:48:17koweysetstatus: unread -> unknown
nosy: + darcs-devel
messages: + msg1882
2007-07-20 21:00:34koweysetnosy: - darcs-devel
superseder: + option to show commute progress (e.g. pull --very-verbose)
2007-07-21 23:06:22quicksetnosy: + darcs-devel, quick
messages: + msg1889
2007-07-22 04:20:31koweysetmessages: + msg1890
2008-01-30 02:33:24markstossetstatus: unknown -> resolved-in-unstable
nosy: + markstos
messages: + msg2887
2008-09-04 21:31:13adminsetstatus: resolved-in-unstable -> resolved
nosy: + dagit
2009-08-06 17:33:54adminsetnosy: + jast, Serware, dmitry.kurochkin, mornfall, simon, thorkilnaur, - droundy, quick
2009-08-06 20:31:20adminsetnosy: - beschmi
2009-08-10 22:06:13adminsetnosy: + quick, - jast, Serware, mornfall
2009-08-11 00:01:35adminsetnosy: - dagit
2009-08-25 17:48:36adminsetnosy: - simon
2009-08-27 14:07:07adminsetnosy: tommy, kowey, markstos, darcs-devel, zooko, quick, thorkilnaur, dmitry.kurochkin