darcs

Issue 2740 For darcs log, combining --verbose and --machine-readable does not include the patch hash anywhere in the output

Title For darcs log, combining --verbose and --machine-readable does not include the patch hash anywhere in the output
Priority Status unknown
Milestone Resolved in
Superseder Nosy List tuckerm
Assigned To
Topics

Created on 2025-03-09.03:18:27 by tuckerm, last changed 2025-03-24.21:19:01 by bfrk.

Messages
msg24194 (view) Author: tuckerm Date: 2025-03-09.03:18:23
Just came across something when trying to parse the output of `darcs log`, and it seems like this is probably not intentional.

If you run `darcs log --verbose --machine-readable`, the hash of the patch is not included anywhere in the output. There is no way to see the actual identifier for any of the patches that you are looking at.

The reason I wanted to do this was because I couldn't find a way to include the full diff when using --xml-output (adding --verbose doesn't seem to do anything with --xml-output), so I was parsing the regular output and noticed that the hash is simply not included if you do --machine-readable.

Darcs version: 2.16.5 (release)
msg24195 (view) Author: bfrk Date: 2025-03-11.09:20:11
Generally speaking, I'd prefer to add proper support for --xml-output --verbose, 
rather than adding new features to --machine-readable. However, this would 
require to encode the line-content in hunks as binary data, e.g. via base64 or 
something similar, because as far as Darcs is concerned, the file content is just 
raw bytes.

Adding the patch identifier to --machine-readable is easy, but it is unclear to 
me how it should be presented in the output. The so called machine-readable 
format is basically just the raw encoding of patches as they are stored 
internally, and that doesn't contain the "identifier" because that is merely a 
hash of the patch's meta data.
msg24196 (view) Author: bfrk Date: 2025-03-14.19:21:35
One could perhaps default hunk lines as UTF-8 encoded text, and find some 
way of escaping anything that doesn't fit this encoding. However, I am not 
aware of any library that offers such a functionality, so this would have 
to be written by someone...
msg24197 (view) Author: tuckerm Date: 2025-03-15.04:09:24
Ah, I had assumed that --machine-readable was just a slight reformatting
of the default log output; I hadn't realized that it was actually calling
something else under the hood.

There's also the fact that, if someone is already parsing that output,
adding some new data to it would break their parser.

Adding the --verbose output to the XML-formatted option would make the most sense,
in my opinion. Adding another field to that won't be a breaking change for anyone
reading that XML already.
msg24198 (view) Author: bfrk Date: 2025-03-23.07:29:07
Do you have any suggestion how to handle the encoding issue with --xml-
output --verbose?
msg24199 (view) Author: tuckerm Date: 2025-03-24.06:33:32
I'm afraid I don't really understand what the encoding issue is. Is it a problem only
for binary files? My assumption was that the hunks in --xml-output --verbose would behave
the same way as in the normal `darcs log --verbose`. So, if the file is a binary file, it would
just say "binary" and not attempt to show a diff. Does `darcs log --verbose` check the
encoding of a file right now? Because it seems to always show a diff for a UTF file, and
always know that it's binary for any image/mp3/executable that I give it.
msg24200 (view) Author: bfrk Date: 2025-03-24.21:01:07
No, it's not about binary patches, just regular hunks. The problem is that 
XML requires that you use the same text encoding for the whole document. 
Darcs has no idea how text is encoded, this is entirely up to the user. 
You may have one file in ISO-8895-1, another in UTF-8, and the next one in 
one of the 7 or 8 different pre-unicode japanese encodings.
msg24201 (view) Author: bfrk Date: 2025-03-24.21:19:01
And to answer your question, this is decided based on two heuristics: 
First is _darcs/prefs/binaries which lists common file name extensions for 
binary file formats. Second is an internal heuristics that looks at the 
first 4k of a file and decides it is binary if there is a byte with value 
0 or 26 (ASCII SYN) among them.
History
Date User Action Args
2025-03-09 03:18:27tuckermcreate
2025-03-11 09:20:14bfrksetmessages: + msg24195
2025-03-14 19:21:36bfrksetmessages: + msg24196
2025-03-15 04:09:24tuckermsetmessages: + msg24197
2025-03-23 07:29:07bfrksetmessages: + msg24198
2025-03-24 06:33:32tuckermsetmessages: + msg24199
2025-03-24 21:01:07bfrksetmessages: + msg24200
2025-03-24 21:19:01bfrksetmessages: + msg24201