Patch 1375: removed special handling of --to-match from cloneRepos... - Darcs bug tracker

Title	removed special handling of --to-match from cloneRepos...
Superseder		Nosy List	bfrk
Related Issues
Status	accepted	Assigned To
Milestone

Created on 2015-06-22.07:24:18 by bfrk, last changed 2015-10-16.19:16:35 by gh.

Files
File name	Status	Uploaded	Type	Edit	Remove
patch-preview.txt		bfrk, 2015-06-22.07:24:18	text/x-darcs-patch
removed-special-handling-of-__to_match-from-clonerepository.dpatch		bfrk, 2015-06-22.07:24:18	application/x-darcs-patch
unnamed		bfrk, 2015-06-22.07:24:18

See mailing list archives for discussion on individual patches.

Messages
msg18591 (view)	Author: bfrk	Date: 2015-06-22.07:24:18
I guess the code I deleted here is cruft left over from adding the "packs" feature. This should be carefully reviewed and tested a bit more to see if there are unwanted side effects. (I don't expect any, just being careful.) TODO: Verify that the special handling this patch removes is not faster when using 'darcs clone --to-match' with a remote repo that has not been optimized to provide packs (using darcs optimize http). 1 patch for repository http://darcs.net/screened: patch f9c60286ae4e03f4fbab473ced211eec370a1ce0 Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de> Date: Mon Jun 22 00:04:21 CEST 2015 * removed special handling of --to-match from cloneRepository I made lots of tests and found (a) no difference in the resulting repos (b) the method chosen with toMatch=False is always much faster, (more than twice as fast, in fact) regardless of whether the repo is local or remote (c) --to-patch was not handled when passing the toMatch parameter, causing the slower variant to be chosen /only/ for --to-match Attachments patch-preview.txt removed-special-handling-of-__to_match-from-clonerepository.dpatch unnamed
msg18774 (view)	Author: gh	Date: 2015-10-09.20:30:56
I'm going to look into this. TODO: dig into "darcs log --matches "hunk economical" and mailing list/patch tracker to know why the code was introduced in the first place and if it is really cruft.
msg18777 (view)	Author: gh	Date: 2015-10-10.22:57:49
The patch that introduced this piece of code is from 2008: "optimized get --to-match handling for darcs 1 repositories". and soon the optimization was extended to hashed repositories too. I find the clone code to be very complex as it is now. It's quite difficult to follow the packs/no-packs, lazy/complete and OF/hashed choices among it, so I'm inclined to remove this optimization even if it's useful in some cases. Also it seems that when this "clone --to-match" code is run, two things are never done: copying the `_darcs/prefs/prefs` file and setting the scripts as executable ... This can be fixed but this show how difficult it is to maintain too many code branches.
msg18783 (view)	Author: gh	Date: 2015-10-16.19:14:15
# Overview of current code and one benchmark When cloning without the optimization, we: * copy the pristine tree * copy the patches (allowing CTRL+C) * unapply unneeded patches (if some matcher is given) to go back to the version we want. When cloning with the optimization, we: * apply the patches to the empty state until reaching the state we want, getting needed patches on demand (this is the ame as old-fashioned repository cloning!) Now, let's see the effect of the optimization on one real world repo. Cloning darcsden's repo (1100 patches) and matching on a patch from the middle of the history (patch number ~700): time darcs clone http://hub.darcs.net/simon/darcsden --no-cache --to-match "hash 31c8ee8" With the optimization, cloning takes 20s, without it takes 50s! (This reminds me of the benchmarks we ran on cloning with packs.) Even when matching on the last patch, the optimization wins: time darcs clone http://hub.darcs.net/simon/darcsden --no-cache --to-match "hash ebbeb01" with: 30s without: 50s! # Decision As I said in my previous comment, repo cloning code is awful and complicated. So I'm accepting Ben's patch anyway but not without some discussion/roadmap for the cloning code. # Proposal I think repo cloning code should be broken in two cases named after what they do: (A) get pristine and possibly unapply patches: For the default case (lazy or not) (B) get patches to apply to an empty state: For old fashioned cloning For to-match, can also apply to --tag (currently not the case). As long as we want to support OF repositories we will have code for (B), so it's probably worth it to rewrite it into a good abstraction that would work for all the (B) cases. I'd suggest: * reimplement cloning code into two cases (A) and (B) described above * also end with this nonsense of repeating "identifyRepository" several times during cloning What bothers me is that you could always create repositories with "hard" patches at the beginning and "easy" patches at the end, or the contrary, which can make (A) or (B) more interesting than the other one. So there is no "always better" solution. Maybe we should provide good enough defaults and a manual switch? Like: * when cloning without matcher -> A by default * when cloning with matcher (including --tag) or OF repo-> B by default An then (one further step, needs discussion): * if user provides --pristine-first / --patches-first flag, default behaviour is overrided? And then (maybe this is too crazy, but we kind of do this already with packs): * run A and B in parallel, keep the one that finishes first? :-]
msg18785 (view)	Author: gh	Date: 2015-10-16.19:16:34
Accepted (I propose the discussion should continue at http://bugs.darcs.net/issue2476 )

History
Date	User	Action	Args
2015-06-22 07:24:18	bfrk	create
2015-10-09 20:30:56	gh	set	messages: + msg18774
2015-10-10 22:57:50	gh	set	messages: + msg18777
2015-10-16 19:14:15	gh	set	messages: + msg18783
2015-10-16 19:16:35	gh	set	status: needs-screening -> accepted messages: + msg18785