darcs

Patch 1375 removed special handling of --to-match from cloneRepos...

Title removed special handling of --to-match from cloneRepos...
Superseder Nosy List bfrk
Related Issues
Status accepted Assigned To
Milestone

Created on 2015-06-22.07:24:18 by bfrk, last changed 2015-10-16.19:16:35 by gh.

Files
File name Status Uploaded Type Edit Remove
patch-preview.txt bfrk, 2015-06-22.07:24:18 text/x-darcs-patch
removed-special-handling-of-__to_match-from-clonerepository.dpatch bfrk, 2015-06-22.07:24:18 application/x-darcs-patch
unnamed bfrk, 2015-06-22.07:24:18
See mailing list archives for discussion on individual patches.
Messages
msg18591 (view) Author: bfrk Date: 2015-06-22.07:24:18
I guess the code I deleted here is cruft left over from adding the "packs"
feature. This should be carefully reviewed and tested a bit more to see if
there are unwanted side effects. (I don't expect any, just being careful.)

TODO: Verify that the special handling this patch removes is not faster when
using 'darcs clone --to-match' with a remote repo that has not been
optimized to provide packs (using darcs optimize http).

1 patch for repository http://darcs.net/screened:

patch f9c60286ae4e03f4fbab473ced211eec370a1ce0
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Mon Jun 22 00:04:21 CEST 2015
  * removed special handling of --to-match from cloneRepository
  
  I made lots of tests and found
  (a) no difference in the resulting repos
  (b) the method chosen with toMatch=False is always much faster, (more than
      twice as fast, in fact) regardless of whether the repo is local or
      remote
  (c) --to-patch was not handled when passing the toMatch parameter, causing
      the slower variant to be chosen /only/ for --to-match
Attachments
msg18774 (view) Author: gh Date: 2015-10-09.20:30:56
I'm going to look into this.

TODO: dig into "darcs log --matches "hunk economical" and mailing
list/patch tracker to know why the code was introduced in the first
place and if it is really cruft.
msg18777 (view) Author: gh Date: 2015-10-10.22:57:49
The patch that introduced this piece of code is from 2008:
"optimized get --to-match handling for darcs 1 repositories".
and soon the optimization was extended to hashed repositories too.

I find the clone code to be very complex as it is now. It's quite
difficult to follow the packs/no-packs, lazy/complete and OF/hashed
choices among it, so I'm inclined to remove this optimization even if
it's useful in some cases.

Also it seems that when this "clone --to-match" code is run,
two things are never done: copying the `_darcs/prefs/prefs` file
and setting the scripts as executable ... This can be fixed but this
show how difficult it is to maintain too many code branches.
msg18783 (view) Author: gh Date: 2015-10-16.19:14:15
# Overview of current code and one benchmark

When cloning without the optimization, we:

* copy the pristine tree
* copy the patches (allowing CTRL+C)
* unapply unneeded patches (if some matcher is given)
  to go back to the version we want.

When cloning with the optimization, we:

* apply the patches to the empty state until reaching the state we want,
   getting needed patches on demand
  (this is the ame as old-fashioned repository cloning!)

Now, let's see the effect of the optimization on one real world repo.

Cloning darcsden's repo (1100 patches) and matching on a
patch from the middle of the history (patch number ~700):

    time darcs clone http://hub.darcs.net/simon/darcsden  --no-cache
--to-match "hash 31c8ee8"

With the optimization, cloning takes 20s, without it takes 50s!

(This reminds me of the benchmarks we ran on cloning with packs.)

Even when matching on the last patch, the optimization wins:

    time darcs clone http://hub.darcs.net/simon/darcsden  --no-cache
--to-match "hash ebbeb01"
    
with: 30s without: 50s!

# Decision

As I said in my previous comment, repo cloning code is awful and
complicated. So I'm accepting Ben's patch anyway but not without
some discussion/roadmap for the cloning code.

# Proposal

I think repo cloning code should be broken in two cases named after
*what they do*:

(A) get pristine and possibly unapply patches:
        
    For the default case (lazy or not)

(B) get patches to apply to an empty state:

    For old fashioned cloning
    For to-match, can also apply to --tag (currently not the case).

As long as we want to support OF repositories we will have code for
(B), so it's probably worth it to rewrite it into a good abstraction
that would work for all the (B) cases.

I'd suggest:

* reimplement cloning code into two cases (A) and (B) described above
* also end with this nonsense of repeating "identifyRepository" several
  times during cloning

What bothers me is that you could always create repositories with "hard"
patches at the beginning and "easy" patches at the end, or the contrary,
which can make (A) or (B) more interesting than the other one.
So there is no "always better" solution.

Maybe we should provide good enough defaults and a manual switch?

Like:

* when cloning without matcher -> A by default
* when cloning with matcher (including --tag)  or OF repo-> B by default

An then (one further step, needs discussion):

* if user provides --pristine-first / --patches-first flag, default
  behaviour is overrided?

And then (maybe this is too crazy, but we kind of do this already with
packs):

* run A and B in parallel, keep the one that finishes first? :-]
msg18785 (view) Author: gh Date: 2015-10-16.19:16:34
Accepted (I propose the discussion should continue at
http://bugs.darcs.net/issue2476 )
History
Date User Action Args
2015-06-22 07:24:18bfrkcreate
2015-10-09 20:30:56ghsetmessages: + msg18774
2015-10-10 22:57:50ghsetmessages: + msg18777
2015-10-16 19:14:15ghsetmessages: + msg18783
2015-10-16 19:16:35ghsetstatus: needs-screening -> accepted
messages: + msg18785