darcs

Issue 859 wish: mechanism for 3rd party tools to get patch file name (hashed repositories; file hash)

Title wish: mechanism for 3rd party tools to get patch file name (hashed repositories; file hash)
Priority wishlist Status given-up
Milestone Resolved in
Superseder Nosy List Serware, btcoburn, darcs-devel, dmitry.kurochkin, jaredj, kowey, lele, thorkilnaur, tommy
Assigned To
Topics Hashed

Created on 2008-05-18.12:10:12 by btcoburn, last changed 2017-07-30.23:52:12 by gh.

Messages
msg4746 (view) Author: btcoburn Date: 2008-05-18.12:10:07
Patch hashes and file names no longer match in darcs2 and hashed format
repositories. This makes it impossible to lookup patch files directly from
"darcs changes --xml-output". 

For example a patch with darcs hash
20080128005838-05dcb-a7c228b57b2e79e7bffc030afe4a0779e4ffd17b now has file name
0000040078-73e502b633fb4b4cb3153793cd948fa2b95bc3f05ff570c1a7440a93340f60ea.

Please update the XML output for darcs2 & hashed repos. I suggest adding a
"patch-file" attribute to the "patch" element. Also, the "hash" attribute should
no longer end in ".gz" as this is misleading. Thanks.
msg4747 (view) Author: kowey Date: 2008-05-18.12:14:02
Lele: does this sound like something you might be interested in working on? 
Feel free to unassign yourself, of course!  (or I'll do so if I don't hear from
you in a while)
msg4755 (view) Author: lele Date: 2008-05-18.18:06:39
On Sun, 18 May 2008 12:14:07 -0000
Eric Kow <bugs@darcs.net> wrote:

> Lele: does this sound like something you might be interested in
> working on? Feel free to unassign yourself, of course!  (or I'll do
> so if I don't hear from you in a while)

I spent a few rainy-hours on the issue, basically finding how the
filename gets computed, and then trying to augment Info.lhs::to_xml to
show it. But, even if simple enough, I wasn't able to workaround the
"Module imports form a cycle for modules" I got.

This is where I am right now:

    hunk ./src/Darcs/Patch/Info.lhs 37
    +import Darcs.Patch.Core ( infopatch )
    +import Crypt.SHA256 ( sha256sum )
    hunk ./src/Darcs/Patch/Info.lhs 145
    +    <+> text "file='" <> text (make_patch_filename pi) <> text "'>"
    hunk ./src/Darcs/Patch/Info.lhs 193
    +make_patch_filename :: PatchInfo -> String
    +make_patch_filename pi = hash
    +    where ps = infopatch pi
    +          hash = case show (lengthPS ps) of
    +                 x | l > 10 -> sha256sum ps
    +                   | otherwise -> take (10-l) (repeat '0') ++ x ++'-':sha256sum ps
    +                                        where l = length x
    +

that, as you can see, is little more than a cut&paste from
Darcs/Repository/Prefs.lhs, which I'm not even sure I understand :-)

Can some kind soul explain me

a) the meaning of the "x | l > 10" expression in the case stmt?
b) point me in the right direction to avoid the circularity?

Thank you, first of all for the encouragement!
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4757 (view) Author: lele Date: 2008-05-19.07:57:40
Hi,

the code below at least compiles and produces something...

hunk ./src/Darcs/Patch/Info.lhs 36
+import Crypt.SHA256 ( sha256sum )
hunk ./src/Darcs/Patch/Info.lhs 144
+    <+> text "file='" <> text (make_patch_filename pi) <> text "'>"
hunk ./src/Darcs/Patch/Info.lhs 192
+make_patch_filename :: PatchInfo -> String
+make_patch_filename _ =
+    let pc = packString "lele" -- of course this should be the actual patch content
+        cl = lengthPS pc
+        l = show cl
+        ll = length l
+        in
+        if ll>10 then sha256sum pc -- ?? so, beyond 10Gb, just the checksum ??
+                 else take (10-ll) (repeat '0') ++ l ++ '-' : sha256sum pc
+

I'm still missing the key point, that is, obtaining the actual patch
contents given its PatchInfo. Another point that needs to be cleared
is the limit on 10Gb, beyond that the names is composed only by the
checksum.

Thanks in advance for any help,
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4760 (view) Author: kowey Date: 2008-05-19.08:50:00
> +        ll = length l
> +        in
> +        if ll>10 then sha256sum pc -- ?? so, beyond 10Gb, just the checksum ??
> +                 else take (10-ll) (repeat '0') ++ l ++ '-' : sha256sum pc

Haha, it took me a while to figure out why you were talking about 10
/giga/bytes. Now I understand that it's because we're counting the
number of digits used to represent the string length (which we might
as well do since we have to write the length out as a string anyway).

Looking in darcs changes, I see that David added this string length
code to help deal with the timestamp issue.   I am guessing here that
for some reason, we want these file length strings to be uniform
(notice the padding?) and that we use just use arbitrary cutoff (10
digits) to do this.  I'm not sure though.  David: could you explain
why we stop using the file length on >= 10G?

Tue Mar  4 17:11:20 GMT 2008  David Roundy <droundy@darcs.net>
  * add option to store file lengths with hashes.
  This is a maneuver towards alleaviating the trouble with timestamps, since
  we can quickly check if a file length has changed.

By the way, Lele you should consider adding an entry to
http://wiki.darcs.net/index.html/DarcsInternals explaining how patch
hashes and filenames are computed (with a historical perspective)

> I'm still missing the key point, that is, obtaining the actual patch
> contents given its PatchInfo

I can't think of a way to do this.  It may be necessary to think about
where the code for generating these XML lives.  Might have to bump it
up now that it (presumably) needs to know about the patch contents as
well as the metadata.  It could be useful to look at the code that
writes the inventory and see what it does.
msg4788 (view) Author: lele Date: 2008-05-20.12:42:12
On Mon, 19 May 2008 08:50:03 -0000
Eric Kow <bugs@darcs.net> wrote:

> By the way, Lele you should consider adding an entry to
> http://wiki.darcs.net/index.html/DarcsInternals explaining how patch
> hashes and filenames are computed (with a historical perspective)

http://wiki.darcs.net/index.html/NamedPatch

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg5926 (view) Author: kowey Date: 2008-09-07.10:50:30
So I now understand this issue better.  

There are two hashes that darcs computes for a patch.  

The patch hash, which is based exclusively on its metadata (this allows named
patches to retain an identifier even if the actual patch representation changes
through commutation).

The file hash (used in hashed repositories) is based on the file contents.  This
is used not just for patches, but for inventories and pristine files as well.

Clearly, the file hash does not belong in patchinfo, because we would then lose
the property that patch info is preserved through commutation.   Likewise, now
it does not seem like it would not really be wise for darcs changes to output
file hashes at all (as these file hashes will change from repository to
repository).  Of course, it would be good if we came up with some means for 3rd
party tools to know how to retrieve the file associated with any given patch. 
Does anybody have an idea?  Maybe a new 'show' command with the manual warning
the user that the file hash can be different from repository to repository?  

Maybe Ben and Lele in particular should comment, being authors of 3rd party
tools.  Any thoughts?  You can get at the file hash by looking at inventory
files, for what it's worth.

(removing David to avoid spamming him)

P.S. Lele, sorry for sending you on a wild goose chase. I didn't fully
understand the issue (nor was I fully paying attention) until now.  Hopefully
you got something useful out of the experience!
msg6343 (view) Author: btcoburn Date: 2008-10-19.20:21:48
(Was off-list for 2 months due to work... trying to catch up now. :-)

In the meantime I rewrote the code that was stuck on this issue (forging tags
for disjunct subsets of patches) to parse the whole repo inventory. There was
another reason I had to do this, so it turned out not being much more work.

I understand your argument about information supplied by 'darcs changes'. I was
hoping to avoid doing something inefficient just to make the connection from
patches to file names (especially if darcs already has this information on hand
when providing change information). The power of 'darcs changes' for 3rd party
tools are the patch and patch subset selection options. I think it is fairly
common for a selection to be followed by inspecting the item(s) one has selected.

Would only showing file names with a special flag like '--show-filename' (in
'darcs changes') be a good compromise? Help would document that these names are
NOT stable and unchanging? [<- This still sounds "confusing/scary" so a
reassuring explanation in the manual would be helpful IMHO.]

P.S. I can live without this... And other developers can surely parse the repo
inventory and regularly rebuild their own map of patch info to file names... It
just seems to me that not providing this raises the entry barrier to writing
tools that leverage darcs.
msg8431 (view) Author: kowey Date: 2009-08-23.20:00:47
Well, I'd rather not see an extra flag and we've got the issue of the filenames
changing because of commutation.

Ben says he can live without this, so let's wont-fix for now.

See also issue1505.
msg10533 (view) Author: kowey Date: 2010-03-26.15:24:18
In msg5926, I argued against putting this information in darcs changes
on the basis that the file hashes change during commutation, and that it
may be unwise/misleading to make this information too readily accessible
(for fear that they would use it expecting it to be stable).

I've since come to reconsider this position.  I think it makes sense now
to have both a hash="PATCHINFO-HASH" and a filehash="FILEHASH".

Steve Keuchel was working on this during the 2010-03 sprint.  If we drop
the ball on this bug, you should get in touch with him.
History
Date User Action Args
2008-05-18 12:10:12btcoburncreate
2008-05-18 12:14:07koweysetstatus: unread -> unknown
nosy: + lele
messages: + msg4747
assignedto: lele
2008-05-18 18:06:43lelesetnosy: tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware
messages: + msg4755
2008-05-19 07:57:43lelesetnosy: tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware
messages: + msg4757
2008-05-19 08:50:03koweysetnosy: + droundy
messages: + msg4760
2008-05-19 15:01:50droundysetpriority: bug -> wishlist
nosy: droundy, tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware
2008-05-20 12:42:13lelesetnosy: droundy, tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware
messages: + msg4788
2008-09-07 10:50:37koweysetnosy: - droundy
messages: + msg5926
assignedto: lele ->
2008-09-07 10:51:53koweysettopic: - Regression
nosy: tommy, beschmi, kowey, dagit, lele, btcoburn, jaredj, Serware
title: List patch file names in "darcs changes --xml-output" for hashed repos -> wish: mechanism for 3rd party tools to get patch file name (hashed repositories; file hash)
2008-10-19 20:21:55btcoburnsetnosy: + dmitry.kurochkin, simon, thorkilnaur
messages: + msg6343
2009-08-06 21:05:03adminsetnosy: - beschmi
2009-08-11 00:14:24adminsetnosy: - dagit
2009-08-23 20:00:51koweysetstatus: unknown -> wont-fix
nosy: tommy, kowey, lele, simon, thorkilnaur, btcoburn, jaredj, dmitry.kurochkin, Serware
topic: - ProbablyEasy
messages: + msg8431
2009-08-25 17:31:27adminsetnosy: + darcs-devel, - simon
2009-08-27 14:31:15adminsetnosy: tommy, kowey, darcs-devel, lele, thorkilnaur, btcoburn, jaredj, dmitry.kurochkin, Serware
2009-10-23 22:43:02adminsetnosy: + serware, - Serware
2009-10-23 23:28:59adminsetnosy: + Serware, - serware
2010-03-26 15:22:56koweylinkissue1768 superseder
2010-03-26 15:24:19koweysetstatus: wont-fix -> needs-implementation
topic: + Hashed, - Darcs2
messages: + msg10533
2017-07-30 23:52:12ghsetstatus: needs-implementation -> given-up