darcs

Patch 2456 Fix rendering of non-ASCII characters in `darcs show dependencies`

Title Fix rendering of non-ASCII characters in `darcs show dependencies`
Superseder Nosy List ki11men0w
Related Issues
Status accepted Assigned To
Milestone

Created on 2025-04-12.00:09:08 by ki11men0w, last changed 2025-04-23.13:13:32 by bfrk.

Files
File name Status Uploaded Type Edit Remove
fix-rendering-of-non_ascii-characters-in-darcs-show-dependencies.dpatch ki11men0w, 2025-04-12.00:09:07 application/octet-stream
See mailing list archives for discussion on individual patches.
Messages
msg24203 (view) Author: ki11men0w Date: 2025-04-12.00:09:07
Using `show` (from `Show` inctance) when rendering output to graphviz to
escape non-ASCII characters in patch comments makes it pointless to use
the `darcs show dependencies` command if patch comments are written in
national languages.

I think such strict escaping is redundant. It is enough to apply the
escaping rules for quoted strings in the DOT language:
> In quoted strings in DOT, the only escaped character is
> double-quote `"`.  That is, in quoted strings, the dyad `\"` is
> converted to `"`; all other characters are left unchanged.

There remains one other potential problem that is guaranteed to be
solved by using `show` - encoding.  Graphviz by default assumes that the
input with instructions for it is encoded in UTF-8.  If the input
contains invalid UTF-8 byte sequences, graphviz will be able to process
it only if we additionally set the document attribute `charset=latin1`
(this attribute can be set, for example, by specifying the command line
option `dot -Gcharset=latin1 ...`). In this case, messages with
non-ASCII characters in the final document created by graphviz will also
be distorted. This problem will be encountered by users who use national
languages ​​but do not use UTF-8 for encoding. But for these users it will
not be worse: both with and without using `show`, comments will be
distorted.

Those who use only ASCII characters or UTF-8 encoding for comments will
not notice any difference.

The only case where this change will make things worse is when only a
very small part of comments are not encoded in UTF-8. The with `show`
variant distorted only a small part of the patches, and graphviz always
accepted the output from darcs without errors. After applying this
patch, users will have to explicitly specify the `charset=latin1`
attribute in this case, which may be perceived as a degradation.

The positive effect of this patch will be felt by those who write
comments to patches in national languages ​​and use a locale with UTF-8
encoding. It seems to me that this group of users is much larger than
those who will encounter the problems described above. With `show`,
there is no point in using the `darcs show dependencies` command for
these users.

It might be worth adding a command line option to the `show
dependencies` command that allows choosing whether to fully escape
non-ASCII characters as before or not. In my opinion, such an option is
redundant, but I am ready to add it if the core team deems it right.

I have been using darcs with this patch using Cyrillic in combination
with UTF-8 for quite a long time and actively without any problems. At
the same time I used graphviz of different versions (from 2.43.0 to
12.2.0).
Attachments
msg24205 (view) Author: bfrk Date: 2025-04-23.13:13:31
Makes complete sense. Accepted.
History
Date User Action Args
2025-04-12 00:09:08ki11men0wcreate
2025-04-23 13:13:32bfrksetstatus: needs-screening -> accepted
messages: + msg24205