darcs

Issue 1039 wish: detect seemingly unrelated repositories

Title wish: detect seemingly unrelated repositories
Priority feature Status resolved
Milestone 2.1.x Resolved in
Superseder Nosy List Serware, arjanb, darcs-devel, dmitry.kurochkin, ganesh, jaredj, kowey, simonpj, thorkilnaur
Assigned To
Topics ProbablyEasy, UI

Created on 2008-08-28.08:40:45 by kowey, last changed 2010-06-15.21:48:13 by admin.

Messages
msg5751 (view) Author: kowey Date: 2008-08-28.08:40:41
This seems easy enough that we should consider doing it for 2.0.3.

Quoting Jason from issue1026:

The only other thing I can think of to help this bug is to add a
feature where you put a "unique" token somewhere in _darcs/*, perhaps
in _darcs/format, that is assigned during 'init'.  If you try to merge
two repositories that vary by token, then darcs says, "These
repositories seem to be unrelated, proceed? [y/N]"
msg5754 (view) Author: kowey Date: 2008-08-28.09:47:35
Quoting Simon PJ from issue1026:

Something like that, yes.  The point is this: if darcs falls over saying "bug in
get_extra" then my reaction is "Darcs is unreliable and I should stop using it".
 If it says politely "You are trying to merge patches from one repository into
an utterly different one; you've probably got the wrong prefs/defaultrepo" or
something like that, then I think "Oh I'm being an idiot, how great Darcs is".

You see the difference?  How it's achieved is a different matter.

Incidentally, even if Darcs *doesn't* crash, I *still don't* want to
accidentally merge 10,000 patches from ghc's repo into libraries/array.  Yes, it
might in principle be do-able, but I really want Darcs to say "this looks silly
to me, are you really sure you want to do this?".
msg6019 (view) Author: droundy Date: 2008-09-16.23:03:46
Note that this would also be solvable by simple heuristics, such as comparing
the overlap of patches.  Two repositories with no common patches are by
definition unrelated, and unless one of them is empty, you probably don't want
to be pulling between them.  It's something like an O(1) check in the common
case, since you probably only need to check the very first patches of the two
repositories (which does require reading all the inventories, unfortunately,
because of the way they're strung together).  But just comparing the most recent
inventories will 99.99% of the time tell you that the two repositories are related.

In other words, I suspect that adding a unique identifier is over-engineering this.
msg6022 (view) Author: dmitry.kurochkin Date: 2008-09-16.23:59:17
I like the idea better then unique id approach. But I know nothing about darcs
internals like Repository and Patch. I will take a look it. But it is likely
that someone else will have to implement this...

It would be really great if there were docs (or papers, or presentations) on
main darcs concepts like repository, inventory, patch. Are there any?

Regards,
  Dmitry
msg6023 (view) Author: dagit Date: 2008-09-17.00:17:16
This wishlist item was generated from issue1026, so we should give that issue
special consideration in the implementation we take here.

It seems that because of issue27 the heuristic proposed wouldn't be sufficient
unless we also compare patch contents when the patchinfos match.

See issue1026 for more details.  The summary is that in issue1026 darcs believed
the repositories to share patches when in fact they had some common patchinfos
but did not have common patches.
msg6032 (view) Author: droundy Date: 2008-09-17.15:20:59
Jason: Okay, I'm breaking down and implementing a fix for issue27, which means
that we can just check for overlap of patch IDs.

Dmitry: there aren't really any docs, but it's pretty simple.  Each patch has a
"name" which is a PatchInfo, and these names are stored in the inventory, along
with pointers to the patch contents.  You can get the list of all patch names in
a repository with:

ps <- read_repo repository
let pinfos = mapRL (mapRL info) ps

so now pinfos will be a [[PatchInfo]] including all patches in the repository
starting with the most recent.  It's a list of lists because the inventory is
broken into multiple files to save bandwidth (since we often only need look at
the most recent patches) so you'd want to look for overlap between the first
elements first, and so on.

P.S. A Repository is just a data structure that holds whatever is needed to read
from a repository.
msg6034 (view) Author: dagit Date: 2008-09-17.17:57:19
On Wed, Sep 17, 2008 at 8:21 AM, David Roundy <bugs@darcs.net> wrote:
>
> David Roundy <droundy@darcs.net> added the comment:
>
> Jason: Okay, I'm breaking down and implementing a fix for issue27, which means
> that we can just check for overlap of patch IDs.

David,

When you get a chance could you please update issue27 describing your
plan of attack for fixing it?  I would be interested in reading about
it (maybe I could also help as time permits).  Also, do you think you
could comment on how the fix will apply to existing repos like the GHC
repo mentioned in issue1026?

Thanks!
Jason
msg6035 (view) Author: droundy Date: 2008-09-17.18:36:30
On Wed, Sep 17, 2008 at 10:57:10AM -0700, Jason Dagit wrote:
> On Wed, Sep 17, 2008 at 8:21 AM, David Roundy <bugs@darcs.net> wrote:
> >
> > David Roundy <droundy@darcs.net> added the comment:
> >
> > Jason: Okay, I'm breaking down and implementing a fix for issue27, which means
> > that we can just check for overlap of patch IDs.
> 
> David,
> 
> When you get a chance could you please update issue27 describing your
> plan of attack for fixing it?  I would be interested in reading about
> it (maybe I could also help as time permits).  Also, do you think you
> could comment on how the fix will apply to existing repos like the GHC
> repo mentioned in issue1026?

I posted the patch to darcs-users.  Did it fail to arrive? It's under
100 lines of code, so I suspect it'd be easiest to just read it.  All
I do is throw in some random junk in the long comment of new
PatchInfos.  It's not going to help old broken repositories like ghc
has.  Fortunately, they'll soon be dropping these repositories.

David
msg6037 (view) Author: ganesh Date: 2008-09-17.21:43:14
I often use darcs init and then darcs pull rather than darcs get, and also share
common setup patches (like versioning the boring file) between different
repositories. So both the "unique token" and the "no common patches" heuristics
wouldn't work very well in my use cases. It's not a big deal though.
msg6038 (view) Author: droundy Date: 2008-09-17.22:23:21
On Wed, Sep 17, 2008 at 5:43 PM, Ganesh Sittampalam <bugs@darcs.net> wrote:
>
> Ganesh Sittampalam <ganesh@earth.li> added the comment:
>
> I often use darcs init and then darcs pull rather than darcs get, and also share
> common setup patches (like versioning the boring file) between different
> repositories. So both the "unique token" and the "no common patches" heuristics
> wouldn't work very well in my use cases. It's not a big deal though.

The "no common patches" heuristic would obviously need to check
whether there are any patches at all.  That's not a hard check.

David
msg6039 (view) Author: ganesh Date: 2008-09-17.22:26:23
"no common patches" would erroneously consider almost entirely unrelated
repositories of mine as related, because they happen to share the "set up
boring" patches.
msg6043 (view) Author: kowey Date: 2008-09-18.09:03:59
For what it's worth, I find Jason's token based approach to be simpler and more
predictable (from the user's point of view).

I was thinking that we do not have to make any guarantees about the repository
identifier; it could just be a text file (say _darcs/repo-id) that can be
created, modified, deleted by the user with no ill effects. To be helpful, darcs
could create a default token for you when you init.

What I like about the id approach is that the user has control over the whole
process.  You want to make this repo have the same id has the other one, fine,
just copy the ancestry token.  The only stumbling block is that users may not
realise how simple, stupid (this is a compliment) and flexible the approach is.
   But maybe that can be solved with documentation or a UI message, like

"If you want darcs to treat this repository as related, just copy the
_darcs/repo-id file"

(Here we are conspicuously NOT doing it for the user so as to de-mystify the
approach)
msg6081 (view) Author: dmitry.kurochkin Date: 2008-09-21.21:56:51
The following patch updated the status of issue1039 to be resolved:

* Resolve issue1039: detect seemingly unrelated repositories when doing push, pull and send.
History
Date User Action Args
2008-08-28 08:40:45koweycreate
2008-08-28 08:45:02koweysettopic: + ProbablyEasy
nosy: + jaredj
2008-08-28 09:07:18koweysetstatus: unread -> needs-reproduction
nosy: kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, serware, Serware
2008-08-28 09:47:41koweysetnosy: kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, serware, Serware
messages: + msg5754
2008-08-28 09:57:37koweysettopic: + UI
nosy: kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, serware, Serware
2008-09-16 23:03:53droundysetnosy: + dmitry.kurochkin, droundy
messages: + msg6019
2008-09-16 23:59:23dmitry.kurochkinsetnosy: droundy, kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6022
2008-09-17 00:17:27dagitsetnosy: droundy, kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6023
2008-09-17 15:21:06droundysetnosy: droundy, kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6032
2008-09-17 17:57:22dagitsetnosy: droundy, kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6034
2008-09-17 18:36:33droundysetnosy: droundy, kowey, darcs-devel, dagit, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6035
2008-09-17 21:43:21ganeshsetnosy: + ganesh
messages: + msg6037
2008-09-17 22:23:23droundysetnosy: droundy, kowey, darcs-devel, dagit, ganesh, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6038
2008-09-17 22:26:31ganeshsetnosy: droundy, kowey, darcs-devel, dagit, ganesh, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6039
2008-09-18 09:04:06koweysetnosy: droundy, kowey, darcs-devel, dagit, ganesh, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
messages: + msg6043
2008-09-21 21:56:53dmitry.kurochkinsetstatus: needs-reproduction -> resolved-in-unstable
nosy: + simon
messages: + msg6081
2009-04-22 03:33:20twbsetstatus: resolved-in-unstable -> resolved
nosy: droundy, kowey, darcs-devel, dagit, ganesh, simonpj, simon, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
2009-08-06 18:00:16adminsetnosy: + markstos, jast, zooko, mornfall, tommy, beschmi, - droundy, ganesh, simonpj, arjanb, jaredj, serware
2009-08-06 21:12:01adminsetnosy: - beschmi
2009-08-10 21:47:50adminsetnosy: + serware, ganesh, simonpj, arjanb, jaredj, - tommy, markstos, zooko, jast, mornfall
2009-08-10 23:44:04adminsetnosy: - dagit
2009-08-25 17:25:07adminsetnosy: - simon
2009-08-27 14:18:56adminsetnosy: kowey, darcs-devel, ganesh, simonpj, arjanb, thorkilnaur, jaredj, dmitry.kurochkin, serware, Serware
2009-10-23 22:44:28adminsetnosy: - Serware
2009-10-23 23:28:08adminsetnosy: + Serware, - serware
2010-05-31 14:13:17koweylinkissue1855 superseder
2010-06-15 21:48:12adminsetmilestone: 2.1.x
2010-06-15 21:48:13adminsettopic: - Target-2.1