darcs

Issue 1990 Darcs shows question mark diamonds instead of accented characters

Title Darcs shows question mark diamonds instead of accented characters
Priority bug Status duplicate
Milestone 2.8.0 Resolved in
Superseder Check displaying of Unicode patch metadata
View: 1693
Nosy List asr, dmitry.kurochkin, nomeata, tux_rocker
Assigned To tux_rocker
Topics Regression, UI

Created on 2010-11-07.13:10:28 by tux_rocker, last changed 2012-06-04.12:58:22 by nomeata.

Messages
msg12934 (view) Author: tux_rocker Date: 2010-11-07.13:10:26
Andrés Sicard-Ramírez reported on darcs-users:

> Using darcs 2.4.4, darcs changes shows my author's name as

> Andrés Sicard-Ramírez

> but darcs 2.5 shows it as

> Andr�s Sicard-Ram�rez

> Is there anything I need to do to see the accent marks again?

to which Reinier replied:

Most likely, this is not a regression and darcs 2.4.4 and darcs 2.5
are both broken in different ways. Apparently darcs 2.4.4 works better
for you :(.

> Now of course we're interested in fixing the brokenness in darcs 2.5.
> To do so I'd like to know:

> * What is your operating system?
> * If it's Linux, what is the output of "locale charmap"?
> * If it's okay to publish it, the output of "darcs changes --xml" for
this patch

And Andrés replied:


> I realized the issue is not only with the accent marks in the
> author's name, but also with Unicode symbols in the patch's name.

> $ darcs changes --xml
> <patch author='Andrés Sicard-Ramírez 
&lt;andres.sicard.ramirez@gmail.com&gt;' date='20101102164505'
local_date='Tue N
ov  2 11:45:05 COT 2010' inverted='False'
hash='20101102164505-3bd4e-3ae47ad6181b492ca93f785914015e7fbb495ad0.gz'>
>       <name>Added x∷xs≈x∷ys→xs≈ys.</name>
>       <comment>Ignore-this: 558c75da5d71ad6cd286d878bdb4a9f3</comment>
> </patch>

> $ darcs changes
> Tue Nov  2 11:45:05 COT 2010  Andr�s Sicard-Ram�rez
> <andres.sicard.ramirez@gmail.com>
> * Added x7xsHx7ys�xsHys.
msg12935 (view) Author: tux_rocker Date: 2010-11-07.13:19:24
By the way, Andrés also mentioned that his OS was Ubuntu 9.10 and the
output of 'locale charmap' was "UTF-8".

On my machine, when I try to record a patch with the same author and
name, I get:

===
$ darcs init
$ darcs record -A "Andrés Sicard-Ramírez" -m "Added x∷xs≈x∷ys→xs≈ys."
No changes!
$ touch blaat && darcs add blaat
$ darcs record -A "Andrés Sicard-Ramírez" -m "Added x∷xs≈x∷ys→xs≈ys."
addfile ./blaat
Shall I record this change? (1/1)  [ynW...], or ? for more options: y
Finished recording patch 'Added x7xsHx7ys�xsHys.'
$ darcs changes
Sun Nov  7 14:08:53 CET 2010  Andr<U+00E9>s Sicard-Ram<U+00ED>rez
  * Added x<U+2237>xs<U+2248>x<U+2237>ys<U+2192>xs<U+2248>ys.
reinier@adim:/tmp/test$ darcs changes --xml
<changelog>
<patch author='Andrés Sicard-Ramírez' date='20101107130853'
local_date='Sun Nov  7 14:08:53 CET 2010' inverted='False'
hash='20101107130853-6128e-70ed61c5add7edc51ba0412ba9a665b88dc340b7.gz'>
        <name>Added x∷xs≈x∷ys→xs≈ys.</name>
        <comment>Ignore-this: 5839805b5a10bcfd070b235588e2c641</comment>
</patch>
</changelog>
$ locale charmap
UTF-8
$ darcs --version
2.5 (release)
===

We can see darcs changes behaves as intended (which is not optimal, but
at least reflects that darcs knows what it's doing internally). So I
can't reproduce Andrés's problem. But in the "Finished recording patch"
message, we see the same wrong version of the patch name that the
reporter sees when he does 'darcs changes'. I'll investigate that further.
msg12936 (view) Author: tux_rocker Date: 2010-11-07.15:54:58
Actually, I can reproduce it when I set the environment variable 
DARCS_DONT_ESCAPE_ANYTHING or DARCS_DONT_ESCAPE_8BIT to 1.
msg15791 (view) Author: nomeata Date: 2012-06-04.12:58:20
As I started to do agda stuff with lots of unicode in the patches, I am
now also affected by this. With DARCS_DONT_ESCAPE_8BIT=1, the diffs look
good, but any unicode character in the description is printed wrong.
Without DARCS_DONT_ESCAPE_8BIT set, I get lots of things like
<U+00C3><U+00BC>. Notable difference: ä is printed as <U+00E4> in the
description (which is the unicode code point), but as <U+00C3><U+00A4>
in the diff (which is the utf8 encoding of ä).
History
Date User Action Args
2010-11-07 13:10:28tux_rockercreate
2010-11-07 13:19:25tux_rockersetmessages: + msg12935
2010-11-07 15:54:58tux_rockersetmessages: + msg12936
2010-11-07 16:01:12tux_rockersetstatus: needs-reproduction -> duplicate
superseder: + Check displaying of Unicode patch metadata
2010-11-07 20:29:19asrsetnosy: + asr
2012-06-04 12:58:22nomeatasetnosy: + nomeata
messages: + msg15791