darcs

Issue 772 bug in get_extra commuting patch

Title bug in get_extra commuting patch
Priority bug Status resolved
Milestone Resolved in
Superseder patch ids are not collision-free
View: 27
Nosy List darcs-devel, dmitry.kurochkin, kowey, lele, thorkilnaur, tommy, zooko
Assigned To
Topics

Created on 2008-03-31.14:44:42 by lele, last changed 2009-08-27.13:57:23 by admin.

Messages
msg4126 (view) Author: lele Date: 2008-03-31.14:44:39
Hi all,

I'm facing an unexpected trouble trying to merge two different
repositories into one. No matter which direction, or darcs1/darcs2, I
always trigger that "bug in get_extra commuting patch".

I tailorized two different subtrees of a Subversion repository into
two distinct darcs repositories.

Since the two are effectively wired to each other, I'd like to have a
single repository with the two subtrees.

So I basically did:

$ cd /tmp
$ darcs get .../tailorized/repo-A
$ cd repo-A
$ darcs pull .../tailorized/repo-B

and I get the error almost immediately at pull time, with the error
reporting a patch in repo-A. The same happens if I swap the order
(that is, trying to pull repo-A into repo-B): in this case, the error
message mention one patch of repo-B.

Then I rebuilt an up-to-date darcs2 binary, and tried the same (with
and without --hashed) with it, obtaining the very same result.

repo-A has 391 patches while repo-B only 113, and as said by
definition the two sets are completely non-overlapping:

$ ls -l repo-A
drwxrwxr-x 6 lele lele 4096 2008-03-31 15:55 _darcs
drwxrwxr-x 7 lele lele 4096 2008-03-31 15:28 gam-database-pg

$ du -sh repo-A
17M

$ ls -l repo-B
drwxrwxr-x 6 lele lele 4096 2008-03-31 15:55 _darcs
drwxrwxr-x 3 lele lele 4096 2008-03-31 15:29 tools

$ du -sh repo-B
1,4M

As the material is completely under GPL, I have no problem sharing it,
should that help in any way. Please, let me know if there's anything
else I could try.

Thank you in advance,
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4131 (view) Author: droundy Date: 2008-03-31.14:53:21
On Mon, Mar 31, 2008 at 02:44:42PM -0000, Lele Gaifax wrote:
> Then I rebuilt an up-to-date darcs2 binary, and tried the same (with
> and without --hashed) with it, obtaining the very same result.

Could you try using the --darcs-2 format?

> $ du -sh repo-A
> 17M

It's distinctly possible (in fact, downright likely) that what you're
seeing is an out-of-memory error.  It's a horrible error message for an
out-of-memory error, but given what you describe this bug shouldn't happen
(even with darcs-1).  Anyhow, without seeing the repositories, or the
patches involved in the commutation, this is all I can guess.

> As the material is completely under GPL, I have no problem sharing it,
> should that help in any way. Please, let me know if there's anything
> else I could try.

That would be great, if you could give us a couple of URLs to get from.
-- 
David Roundy
Department of Physics
Oregon State University
_______________________________________________
darcs-devel mailing list
darcs-devel@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-devel
msg4134 (view) Author: droundy Date: 2008-03-31.15:20:54
On Mon, 31 Mar 2008 07:45:16 -0700
David Roundy <droundy@darcs.net> wrote:

> On Mon, Mar 31, 2008 at 02:44:42PM -0000, Lele Gaifax wrote:
> > Then I rebuilt an up-to-date darcs2 binary, and tried the same (with
> > and without --hashed) with it, obtaining the very same result.
> 
> Could you try using the --darcs-2 format?

Uhm, not immediately: if I understand, I cannot migrate to that
format, but I should use "darcs2 init --darcs-2" in the tailorization
step... Am I right?

> It's distinctly possible (in fact, downright likely) that what you're
> seeing is an out-of-memory error.

This seems strange, because I get the error almost immediately,
without any apparent load on the machine...

> 
> > As the material is completely under GPL, I have no problem sharing
> > it, should that help in any way.
> 
> That would be great, if you could give us a couple of URLs to get
> from.

Sorry, here it is:

  http://artiemestieri.tn.it/~lele/issue772.tar.bz2

It contains the two original darcs1 repositories without the pristine
trees.

thank you,
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
_______________________________________________
darcs-devel mailing list
darcs-devel@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-devel
msg4139 (view) Author: droundy Date: 2008-03-31.15:38:34
So the problem is that you've got two changes with identical names (and dates,
etc) that describe different changes:

 Sat Sep 25 14:01:03 PDT 2004  lele
   * Rudimentale indice degli script

If you fix tailor to generate unique ids for patches, that would fix this.

This is a duplicate of issue27.  It's debatable whether this is a bug in darcs
or a bug in tailor.  Zooko would argue, I'm sure, that darcs shouldn't give you
the power to shoot yourself in the foot.  I tend to disagree.  I consider it an
feature that you (as the author of tailor) can precisely specify the patch ID of
patches you're converting.  Anyhow, an easy fix (what Zooko wants us to do in
darcs) is to add a bit of garbage into the long message of each patch.  If you
prefix this garbage with something reasonable, we may even add a feature to hide
that garbage from our users.

David
msg4146 (view) Author: lele Date: 2008-03-31.16:52:41
On Mon, 31 Mar 2008 15:38:35 -0000
David Roundy <bugs@darcs.net> wrote:

> 
> 
> So the problem is that you've got two changes with identical names
> (and dates, etc) that describe different changes:
> 
>  Sat Sep 25 14:01:03 PDT 2004  lele
>    * Rudimentale indice degli script
> 
> If you fix tailor to generate unique ids for patches, that would fix
> this.
> 
> This is a duplicate of issue27.  It's debatable whether this is a bug
> in darcs or a bug in tailor.  Zooko would argue, I'm sure, that darcs
> shouldn't give you the power to shoot yourself in the foot.  I tend
> to disagree.  I consider it an feature that you (as the author of
> tailor) can precisely specify the patch ID of patches you're
> converting.  Anyhow, an easy fix (what Zooko wants us to do in darcs)
> is to add a bit of garbage into the long message of each patch.  If
> you prefix this garbage with something reasonable, we may even add a
> feature to hide that garbage from our users.

Thank you David,

And I now understand better how issue27 born :)

So, once you know what the problem is, it's very easy to install a
workaround in tailor, just changing the "patch-name-format" option.

Is there any way for darcs to be more precise in its error message?
Could it diagnose that duplicate id is the reason behind?

thank you again,
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4148 (view) Author: droundy Date: 2008-03-31.18:31:58
On Mon, Mar 31, 2008 at 04:52:43PM -0000, Lele Gaifax wrote:
> 
> 
> On Mon, 31 Mar 2008 15:38:35 -0000
> David Roundy <bugs@darcs.net> wrote:
> 
> > 
> > 
> > So the problem is that you've got two changes with identical names
> > (and dates, etc) that describe different changes:
> > 
> >  Sat Sep 25 14:01:03 PDT 2004  lele
> >    * Rudimentale indice degli script
> > 
> > If you fix tailor to generate unique ids for patches, that would fix
> > this.
> > 
> > This is a duplicate of issue27.  It's debatable whether this is a bug
> > in darcs or a bug in tailor.  Zooko would argue, I'm sure, that darcs
> > shouldn't give you the power to shoot yourself in the foot.  I tend
> > to disagree.  I consider it an feature that you (as the author of
> > tailor) can precisely specify the patch ID of patches you're
> > converting.  Anyhow, an easy fix (what Zooko wants us to do in darcs)
> > is to add a bit of garbage into the long message of each patch.  If
> > you prefix this garbage with something reasonable, we may even add a
> > feature to hide that garbage from our users.
> 
> Thank you David,

You're welcome!

> And I now understand better how issue27 born :)
> 
> So, once you know what the problem is, it's very easy to install a
> workaround in tailor, just changing the "patch-name-format" option.

Indeed.  In fact, in tailor you needn't add random garbage, but could
instead add a little note indicating that the change was generated by
tailor running on a particular repository.  This wouldn't fix all the
issue27 problems (e.g. if one svn repository has two different changes with
identical names and dates), but it would fix this particular problem, and
would also add human-friendly information.

> Is there any way for darcs to be more precise in its error message?
> Could it diagnose that duplicate id is the reason behind?

That would be hard.  The trouble is that darcs assumes that two changes
with the same name are the same change.  In this case, darcs then tries to
move those two changes into the same context, but is unable to do so,
because the only common context is the empty repository, and each of these
changes has some sort of a dependency.  It'd be hard for darcs to figure
out that this is what happened (just as it was hard for me to figure out
what had happened, and I'm smarter than darcs is...).  It might be able to
hazard a guess, but most often this particular bug message is actually
related to conflicts.
-- 
David Roundy
Department of Physics
Oregon State University
msg4152 (view) Author: zooko Date: 2008-03-31.19:39:03
On Mar 31, 2008, at 9:38 AM, David Roundy wrote:
>
> This is a duplicate of issue27.  It's debatable whether this is a  
> bug in darcs
> or a bug in tailor.  Zooko would argue, I'm sure, that darcs  
> shouldn't give you
> the power to shoot yourself in the foot.

My argument is that patch ids should be unique, so that it is  
impossible for there to exist two different patches with the same  
patch id.  This is a property that monotone guaranteed, which git  
adopted, and which mercurial and bzr now offer as well.

It seems like it could be a useful property to rely upon.  It could  
theoretically be used by a future version of darcs to  
cryptographically verify the provenance of a repository, the way that  
monotone and the others already do.

The "garbage" that David referred to in his note would be a secure  
hash of the contents of the patch and the context of the patch, just  
as is done in the other revision control tools.

Hopefully it wouldn't need to be encoded into the long patch  
description itself, however, since that could collide with other uses  
of the long patch description.  Hopefully the patch hash information  
could be stored with the patch description in a separate field, just  
like monotone and the others do.

Regards,

Zooko
msg4153 (view) Author: droundy Date: 2008-03-31.20:29:22
On Mon, Mar 31, 2008 at 01:32:43PM -0600, zooko wrote:
> On Mar 31, 2008, at 9:38 AM, David Roundy wrote:
> >This is a duplicate of issue27.  It's debatable whether this is a bug in
> >darcs or a bug in tailor.  Zooko would argue, I'm sure, that darcs
> >shouldn't give you the power to shoot yourself in the foot.
> 
> My argument is that patch ids should be unique, so that it is  
> impossible for there to exist two different patches with the same  
> patch id.  This is a property that monotone guaranteed, which git  
> adopted, and which mercurial and bzr now offer as well.
>
> It seems like it could be a useful property to rely upon.  It could
> theoretically be used by a future version of darcs to cryptographically
> verify the provenance of a repository, the way that monotone and the
> others already do.

You can never rely upon this property in the presence of hostile attackers,
and in the absence of hostile attackers, the existing behavior is
adequate.  I would say that if one developer creates two patches with the
same name at the same time, he's most likely hostile or he's using a
poorly-designed tool.  If one developer creates patches using another
developer's name then he's definitely hostile (or perhaps confused as to
his identity).

> The "garbage" that David referred to in his note would be a secure hash
> of the contents of the patch and the context of the patch, just as is
> done in the other revision control tools.

No, it would be a secure hash of the contents of the patch and its context
*at the time that it's created*, which is something that cannot be verified
or used in any way (except perhaps if you're lucky, or don't use much of
darcs' functionality).  So assuming it's a secure hash, then this is
garbage.

The key mistake you're making is that you seem to assume that we could
check this hash, but it's an uncheckable hash, because there's no reason to
believe we could ever again recreate that context.  So this information is
no more useful than a truly random number.  Its only advantage over a few
bytes from /dev/random would be (a) that it doesn't deplete your entropy
pool and (b) that tools like tailor would generate the same output when run
twice on the same repository.

> Hopefully it wouldn't need to be encoded into the long patch  
> description itself, however, since that could collide with other uses  
> of the long patch description.  Hopefully the patch hash information  
> could be stored with the patch description in a separate field, just  
> like monotone and the others do.

Sorry, it *would* be encoded into the long patch description itself.  As
I've explained to you before.  I'm not going to break
backwards-compatibility.
-- 
David Roundy
Department of Physics
Oregon State University
msg4156 (view) Author: zooko Date: 2008-03-31.20:47:04
> The key mistake you're making is that you seem to assume that we could
> check this hash, but it's an uncheckable hash, because there's no  
> reason to
> believe we could ever again recreate that context.

This is what I meant by "theoretically could be used by a future  
version of darcs".  It is theoretically possible that a future  
version of darcs could get access to the context.

If in the future there were an extension to darcs to provide such  
contexts, then darcs would gain the same provenance guarantee that  
the other decentralized revision control tools offer without losing  
its unique flexibility.

Perhaps such an extension is too difficult to implement, but perhaps  
not.

>  So this information is
> no more useful than a truly random number.  Its only advantage over  
> a few
> bytes from /dev/random would be (a) that it doesn't deplete your  
> entropy
> pool

/dev/urandom suffices for that, and is no less secure than /dev/random.

> and (b) that tools like tailor would generate the same output when run
> twice on the same repository.

This would be a nice property for it to have.

> Sorry, it *would* be encoded into the long patch description  
> itself.  As
> I've explained to you before.  I'm not going to break
> backwards-compatibility.

I see.

Regards,

Zooko
msg4157 (view) Author: droundy Date: 2008-03-31.20:55:39
On Mon, Mar 31, 2008 at 02:40:48PM -0600, zooko wrote:
> >The key mistake you're making is that you seem to assume that we could
> >check this hash, but it's an uncheckable hash, because there's no reason
> >to believe we could ever again recreate that context.
> 
> This is what I meant by "theoretically could be used by a future  
> version of darcs".  It is theoretically possible that a future  
> version of darcs could get access to the context.

No, if a patch in this context is obliterated (and the file describing that
patch is deleted, or this copy of the repository is deleted), then there is
absolutely no way any possible future version of darcs (okay, maybe I
should add that the hard drive was thrown into a volcano) could reconstruct
that context.  Barring a search of all possible patch names that might have
ever been created.

> If in the future there were an extension to darcs to provide such  
> contexts, then darcs would gain the same provenance guarantee that  
> the other decentralized revision control tools offer without losing  
> its unique flexibility.

As mentioned above, there is no possible way such an extension could be
written.

> Perhaps such an extension is too difficult to implement, but perhaps  
> not.

No, it's not difficult.  It's impossible.

If we modified a future version of darcs to store all patches ever
recorded, and transmit all such patches to every other repository it comes
in contact with, then this hash could be useful for verification purposes.
But until we make that change, there's just no reason to store it except as
repeatable pseudorandom garbage.
-- 
David Roundy
Department of Physics
Oregon State University
msg4161 (view) Author: lele Date: 2008-03-31.22:58:09
On Mon, 31 Mar 2008 18:32:00 -0000
David Roundy <bugs@darcs.net> wrote:

> > So, once you know what the problem is, it's very easy to install a
> > workaround in tailor, just changing the "patch-name-format" option.
> 
> Indeed.  In fact, in tailor you needn't add random garbage, but could
> instead add a little note indicating that the change was generated by
> tailor running on a particular repository.  This wouldn't fix all the
> issue27 problems (e.g. if one svn repository has two different
> changes with identical names and dates), but it would fix this
> particular problem, and would also add human-friendly information.

Well, I think the patch-name-format option offers a good workaround to
that problem as well: by default it rewrites the upstream changelog
prepending something like "[upstream-svn-repo @ 1234]" (where 1234 is
the upstream svn revid) to its text, so effectively those different
changes with identical names and dates [and author, I may add] would
produce /different/ darcs hashes.

I experienced the problem myself exactly because, for the very first
time, I changed that option to avoid that prefix :-)

So the solution for both issues, at least from the tailor point of
view, is just a matter of differentiating in some way the
patch-name-format option.... that is, trusting tailor's default ;-)

I'll add a note about this in the README.

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4164 (view) Author: droundy Date: 2008-04-01.13:11:23
On Mon, Mar 31, 2008 at 10:58:11PM -0000, Lele Gaifax wrote:
> So the solution for both issues, at least from the tailor point of
> view, is just a matter of differentiating in some way the
> patch-name-format option.... that is, trusting tailor's default ;-)

Ah, that explains why you haven't fixed this!  :) (i.e. it's already been
fixed by default.)

> I'll add a note about this in the README.

Thanks!
-- 
David Roundy
Department of Physics
Oregon State University
History
Date User Action Args
2008-03-31 14:44:42lelecreate
2008-03-31 14:53:23droundysetstatus: unread -> unknown
nosy: + darcs-devel, droundy
messages: + msg4131
2008-03-31 14:53:50droundysetpriority: bug
nosy: droundy, tommy, beschmi, kowey, darcs-devel, lele
2008-03-31 14:53:59droundysetnosy: - darcs-devel
2008-03-31 15:20:56droundysetnosy: + darcs-devel
messages: + msg4134
2008-03-31 15:38:35droundysetnosy: droundy, tommy, beschmi, kowey, darcs-devel, lele
messages: + msg4139
2008-03-31 15:38:51droundysetstatus: unknown -> duplicate
nosy: droundy, tommy, beschmi, kowey, darcs-devel, lele
superseder: + patch ids are not collision-free
2008-03-31 16:52:43lelesetnosy: droundy, tommy, beschmi, kowey, darcs-devel, lele
messages: + msg4146
2008-03-31 18:32:00droundysetnosy: droundy, tommy, beschmi, kowey, darcs-devel, lele
messages: + msg4148
2008-03-31 19:39:05zookosetnosy: + zooko
messages: + msg4152
2008-03-31 20:29:24droundysetnosy: droundy, tommy, beschmi, kowey, darcs-devel, zooko, lele
messages: + msg4153
2008-03-31 20:33:04droundysetnosy: - droundy, darcs-devel
2008-03-31 20:47:05zookosetnosy: + darcs-devel, droundy
messages: + msg4156
2008-03-31 20:55:40droundysetnosy: droundy, tommy, beschmi, kowey, darcs-devel, zooko, lele
messages: + msg4157
2008-03-31 22:58:11lelesetnosy: droundy, tommy, beschmi, kowey, darcs-devel, zooko, lele
messages: + msg4161
2008-04-01 13:11:24droundysetnosy: droundy, tommy, beschmi, kowey, darcs-devel, zooko, lele
messages: + msg4164
2008-04-01 13:16:42droundysetstatus: duplicate -> resolved
nosy: droundy, tommy, beschmi, kowey, darcs-devel, zooko, lele
2009-08-06 17:57:45adminsetnosy: + markstos, jast, Serware, dmitry.kurochkin, dagit, mornfall, simon, thorkilnaur, - droundy, lele
2009-08-06 21:01:29adminsetnosy: - beschmi
2009-08-10 22:19:03adminsetnosy: + lele, - markstos, jast, dagit, Serware, mornfall
2009-08-25 18:08:15adminsetnosy: - simon
2009-08-27 13:57:23adminsetnosy: tommy, kowey, darcs-devel, zooko, lele, thorkilnaur, dmitry.kurochkin