Created on 2008-05-18.12:10:12 by btcoburn, last changed 2017-07-30.23:52:12 by gh.
msg4746 (view) |
Author: btcoburn |
Date: 2008-05-18.12:10:07 |
|
Patch hashes and file names no longer match in darcs2 and hashed format
repositories. This makes it impossible to lookup patch files directly from
"darcs changes --xml-output".
For example a patch with darcs hash
20080128005838-05dcb-a7c228b57b2e79e7bffc030afe4a0779e4ffd17b now has file name
0000040078-73e502b633fb4b4cb3153793cd948fa2b95bc3f05ff570c1a7440a93340f60ea.
Please update the XML output for darcs2 & hashed repos. I suggest adding a
"patch-file" attribute to the "patch" element. Also, the "hash" attribute should
no longer end in ".gz" as this is misleading. Thanks.
|
msg4747 (view) |
Author: kowey |
Date: 2008-05-18.12:14:02 |
|
Lele: does this sound like something you might be interested in working on?
Feel free to unassign yourself, of course! (or I'll do so if I don't hear from
you in a while)
|
msg4755 (view) |
Author: lele |
Date: 2008-05-18.18:06:39 |
|
On Sun, 18 May 2008 12:14:07 -0000
Eric Kow <bugs@darcs.net> wrote:
> Lele: does this sound like something you might be interested in
> working on? Feel free to unassign yourself, of course! (or I'll do
> so if I don't hear from you in a while)
I spent a few rainy-hours on the issue, basically finding how the
filename gets computed, and then trying to augment Info.lhs::to_xml to
show it. But, even if simple enough, I wasn't able to workaround the
"Module imports form a cycle for modules" I got.
This is where I am right now:
hunk ./src/Darcs/Patch/Info.lhs 37
+import Darcs.Patch.Core ( infopatch )
+import Crypt.SHA256 ( sha256sum )
hunk ./src/Darcs/Patch/Info.lhs 145
+ <+> text "file='" <> text (make_patch_filename pi) <> text "'>"
hunk ./src/Darcs/Patch/Info.lhs 193
+make_patch_filename :: PatchInfo -> String
+make_patch_filename pi = hash
+ where ps = infopatch pi
+ hash = case show (lengthPS ps) of
+ x | l > 10 -> sha256sum ps
+ | otherwise -> take (10-l) (repeat '0') ++ x ++'-':sha256sum ps
+ where l = length x
+
that, as you can see, is little more than a cut&paste from
Darcs/Repository/Prefs.lhs, which I'm not even sure I understand :-)
Can some kind soul explain me
a) the meaning of the "x | l > 10" expression in the case stmt?
b) point me in the right direction to avoid the circularity?
Thank you, first of all for the encouragement!
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net | -- Fortunato Depero, 1929.
|
msg4757 (view) |
Author: lele |
Date: 2008-05-19.07:57:40 |
|
Hi,
the code below at least compiles and produces something...
hunk ./src/Darcs/Patch/Info.lhs 36
+import Crypt.SHA256 ( sha256sum )
hunk ./src/Darcs/Patch/Info.lhs 144
+ <+> text "file='" <> text (make_patch_filename pi) <> text "'>"
hunk ./src/Darcs/Patch/Info.lhs 192
+make_patch_filename :: PatchInfo -> String
+make_patch_filename _ =
+ let pc = packString "lele" -- of course this should be the actual patch content
+ cl = lengthPS pc
+ l = show cl
+ ll = length l
+ in
+ if ll>10 then sha256sum pc -- ?? so, beyond 10Gb, just the checksum ??
+ else take (10-ll) (repeat '0') ++ l ++ '-' : sha256sum pc
+
I'm still missing the key point, that is, obtaining the actual patch
contents given its PatchInfo. Another point that needs to be cleared
is the limit on 10Gb, beyond that the names is composed only by the
checksum.
Thanks in advance for any help,
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net | -- Fortunato Depero, 1929.
|
msg4760 (view) |
Author: kowey |
Date: 2008-05-19.08:50:00 |
|
> + ll = length l
> + in
> + if ll>10 then sha256sum pc -- ?? so, beyond 10Gb, just the checksum ??
> + else take (10-ll) (repeat '0') ++ l ++ '-' : sha256sum pc
Haha, it took me a while to figure out why you were talking about 10
/giga/bytes. Now I understand that it's because we're counting the
number of digits used to represent the string length (which we might
as well do since we have to write the length out as a string anyway).
Looking in darcs changes, I see that David added this string length
code to help deal with the timestamp issue. I am guessing here that
for some reason, we want these file length strings to be uniform
(notice the padding?) and that we use just use arbitrary cutoff (10
digits) to do this. I'm not sure though. David: could you explain
why we stop using the file length on >= 10G?
Tue Mar 4 17:11:20 GMT 2008 David Roundy <droundy@darcs.net>
* add option to store file lengths with hashes.
This is a maneuver towards alleaviating the trouble with timestamps, since
we can quickly check if a file length has changed.
By the way, Lele you should consider adding an entry to
http://wiki.darcs.net/index.html/DarcsInternals explaining how patch
hashes and filenames are computed (with a historical perspective)
> I'm still missing the key point, that is, obtaining the actual patch
> contents given its PatchInfo
I can't think of a way to do this. It may be necessary to think about
where the code for generating these XML lives. Might have to bump it
up now that it (presumably) needs to know about the patch contents as
well as the metadata. It could be useful to look at the code that
writes the inventory and see what it does.
|
msg4788 (view) |
Author: lele |
Date: 2008-05-20.12:42:12 |
|
On Mon, 19 May 2008 08:50:03 -0000
Eric Kow <bugs@darcs.net> wrote:
> By the way, Lele you should consider adding an entry to
> http://wiki.darcs.net/index.html/DarcsInternals explaining how patch
> hashes and filenames are computed (with a historical perspective)
http://wiki.darcs.net/index.html/NamedPatch
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net | -- Fortunato Depero, 1929.
|
msg5926 (view) |
Author: kowey |
Date: 2008-09-07.10:50:30 |
|
So I now understand this issue better.
There are two hashes that darcs computes for a patch.
The patch hash, which is based exclusively on its metadata (this allows named
patches to retain an identifier even if the actual patch representation changes
through commutation).
The file hash (used in hashed repositories) is based on the file contents. This
is used not just for patches, but for inventories and pristine files as well.
Clearly, the file hash does not belong in patchinfo, because we would then lose
the property that patch info is preserved through commutation. Likewise, now
it does not seem like it would not really be wise for darcs changes to output
file hashes at all (as these file hashes will change from repository to
repository). Of course, it would be good if we came up with some means for 3rd
party tools to know how to retrieve the file associated with any given patch.
Does anybody have an idea? Maybe a new 'show' command with the manual warning
the user that the file hash can be different from repository to repository?
Maybe Ben and Lele in particular should comment, being authors of 3rd party
tools. Any thoughts? You can get at the file hash by looking at inventory
files, for what it's worth.
(removing David to avoid spamming him)
P.S. Lele, sorry for sending you on a wild goose chase. I didn't fully
understand the issue (nor was I fully paying attention) until now. Hopefully
you got something useful out of the experience!
|
msg6343 (view) |
Author: btcoburn |
Date: 2008-10-19.20:21:48 |
|
(Was off-list for 2 months due to work... trying to catch up now. :-)
In the meantime I rewrote the code that was stuck on this issue (forging tags
for disjunct subsets of patches) to parse the whole repo inventory. There was
another reason I had to do this, so it turned out not being much more work.
I understand your argument about information supplied by 'darcs changes'. I was
hoping to avoid doing something inefficient just to make the connection from
patches to file names (especially if darcs already has this information on hand
when providing change information). The power of 'darcs changes' for 3rd party
tools are the patch and patch subset selection options. I think it is fairly
common for a selection to be followed by inspecting the item(s) one has selected.
Would only showing file names with a special flag like '--show-filename' (in
'darcs changes') be a good compromise? Help would document that these names are
NOT stable and unchanging? [<- This still sounds "confusing/scary" so a
reassuring explanation in the manual would be helpful IMHO.]
P.S. I can live without this... And other developers can surely parse the repo
inventory and regularly rebuild their own map of patch info to file names... It
just seems to me that not providing this raises the entry barrier to writing
tools that leverage darcs.
|
msg8431 (view) |
Author: kowey |
Date: 2009-08-23.20:00:47 |
|
Well, I'd rather not see an extra flag and we've got the issue of the filenames
changing because of commutation.
Ben says he can live without this, so let's wont-fix for now.
See also issue1505.
|
msg10533 (view) |
Author: kowey |
Date: 2010-03-26.15:24:18 |
|
In msg5926, I argued against putting this information in darcs changes
on the basis that the file hashes change during commutation, and that it
may be unwise/misleading to make this information too readily accessible
(for fear that they would use it expecting it to be stable).
I've since come to reconsider this position. I think it makes sense now
to have both a hash="PATCHINFO-HASH" and a filehash="FILEHASH".
Steve Keuchel was working on this during the 2010-03 sprint. If we drop
the ball on this bug, you should get in touch with him.
|
|
Date |
User |
Action |
Args |
2008-05-18 12:10:12 | btcoburn | create | |
2008-05-18 12:14:07 | kowey | set | status: unread -> unknown nosy:
+ lele messages:
+ msg4747 assignedto: lele |
2008-05-18 18:06:43 | lele | set | nosy:
tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware messages:
+ msg4755 |
2008-05-19 07:57:43 | lele | set | nosy:
tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware messages:
+ msg4757 |
2008-05-19 08:50:03 | kowey | set | nosy:
+ droundy messages:
+ msg4760 |
2008-05-19 15:01:50 | droundy | set | priority: bug -> wishlist nosy:
droundy, tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware |
2008-05-20 12:42:13 | lele | set | nosy:
droundy, tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware messages:
+ msg4788 |
2008-09-07 10:50:37 | kowey | set | nosy:
- droundy messages:
+ msg5926 assignedto: lele -> |
2008-09-07 10:51:53 | kowey | set | topic:
- Regression nosy:
tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware title: List patch file names in "darcs changes --xml-output" for hashed repos -> wish: mechanism for 3rd party tools to get patch file name (hashed repositories; file hash) |
2008-10-19 20:21:55 | btcoburn | set | nosy:
+ dmitry.kurochkin, simon, thorkilnaur messages:
+ msg6343 |
2009-08-06 21:05:03 | admin | set | nosy:
- beschmi |
2009-08-11 00:14:24 | admin | set | nosy:
- dagit |
2009-08-23 20:00:51 | kowey | set | status: unknown -> wont-fix nosy:
tommy, kowey, lele, simon, thorkilnaur, btcoburn, jaredj, dmitry.kurochkin, Serware topic:
- ProbablyEasy messages:
+ msg8431 |
2009-08-25 17:31:27 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-27 14:31:15 | admin | set | nosy:
tommy, kowey, darcs-devel, lele, thorkilnaur, btcoburn, jaredj, dmitry.kurochkin, Serware |
2009-10-23 22:43:02 | admin | set | nosy:
+ serware, - Serware |
2009-10-23 23:28:59 | admin | set | nosy:
+ Serware, - serware |
2010-03-26 15:22:56 | kowey | link | issue1768 superseder |
2010-03-26 15:24:19 | kowey | set | status: wont-fix -> needs-implementation topic:
+ Hashed, - Darcs2 messages:
+ msg10533 |
2017-07-30 23:52:12 | gh | set | status: needs-implementation -> given-up |
|