|
Created on 2009-11-15.18:08:14 by tux_rocker, last changed 2015-08-11.13:54:08 by xancorreu.
msg9352 (view) |
Author: tux_rocker |
Date: 2009-11-15.18:08:11 |
|
When issue64 has been solved, darcs stores patch metadata as Unicode strings.
However, the code that displays the metadata was not written with non-ASCII
characters in mind. I believe I saw that it replaces non-ASCII characters with
numerical escapes, but I have not been able to find code for that in the Printer
module.
There is also the conceptual question of how to make Printer.renderPS treat
non-ASCII characters.
|
msg13112 (view) |
Author: kowey |
Date: 2010-11-19.10:13:45 |
|
Is this worthy of a Darcs 2.5.1?
Lele just experienced an issue similar to issue1990
|
msg13156 (view) |
Author: tux_rocker |
Date: 2010-11-21.16:24:24 |
|
If we get a fix, yes.
I looked at it for a bit and ultimately what happens is that hPutStr
writes only the lowest 8 bits of a character with a code point > 255. To
solve this, we must either explicitly encode the text with the locale
encoding before hPutStr'ing it, or we require GHC >= 6.12 and don't
switch file handles to binary mode.
The latter solution seems better because it is what the IO library is
designed for. But I don't know which parts of darcs require the file
handles to be in binary mode, and why.
Note that for the Windows line ending behavior that the "binary mode"
flag in C is for, you don't have to put the handle in binary mode in
Haskell. 'hSetNewlineMode noNewlineTranslation' will do that for you and
still write characters outside of latin1 correctly.
|
msg13160 (view) |
Author: mornfall |
Date: 2010-11-21.17:17:51 |
|
Reinier Lamers <bugs@darcs.net> writes:
> I looked at it for a bit and ultimately what happens is that hPutStr
> writes only the lowest 8 bits of a character with a code point > 255. To
> solve this, we must either explicitly encode the text with the locale
> encoding before hPutStr'ing it, or we require GHC >= 6.12 and don't
> switch file handles to binary mode.
We can't do the latter, since it breaks windows. Non-binary handles only
work with ghc <= 6.10 on that platform, encountering a "bad" character
with 6.12 or later will crash the program. Happens to darcs a lot for
various reasons. You really need to use binary handles for everything.
Yours,
Petr.
|
msg13356 (view) |
Author: kowey |
Date: 2010-12-16.16:44:14 |
|
We made an impromptu attempt at working out what was going on in
http://irclog.perlgeek.de/darcs/2010-12-16#i_3094602
It probably duplicates a lot of stuff that Reinier already knows, but
never hurts to repeat the what-the-heck-is-darcs-doing exercise across
different members of the team
|
msg13373 (view) |
Author: kowey |
Date: 2010-12-17.10:11:20 |
|
These two emails may be relevant:
- http://lists.osuosl.org/pipermail/darcs-users/2010-
December/025932.html
- http://lists.osuosl.org/pipermail/darcs-users/2010-May/024023.html
|
msg13466 (view) |
Author: ganesh |
Date: 2011-01-05.23:16:15 |
|
Bumping to 2.5.2 - if a fix appears soon it might still make 2.5.1.
|
msg14549 (view) |
Author: nad |
Date: 2011-06-23.16:25:03 |
|
I just want to note that, due to this bug (in particular, the kind of
output described in issue 1990), I am still using darcs 2.4.4.
|
msg17050 (view) |
Author: rpglover64 |
Date: 2013-09-25.18:47:30 |
|
This bug's status is "needs-reproduction". Does this mean that the
maintainers have not been able to replicate it?
If that's the case, here's a quick way I can reproduce the bug:
> darcs init foo
> cd foo
> touch foo
> darcs add foo
> darcs record
As the patch name, use "берегам" (or any other unicode-containing string)
> darcs log
This displays
* <U+0431><U+0435><U+0440><U+0435><U+0433><U+0430><U+043C>
if DARCS_DONT_ESCAPE_8BIT=0 and
* 15@530<
if DARCS_DONT_ESCAPE_8BIT=1
|
msg17859 (view) |
Author: ganesh |
Date: 2014-11-26.06:34:19 |
|
We shouldn't have left this hanging so long, it's pretty fundamental to
anyone in a non-ASCII locale. I'm taking a look at it now.
|
msg17884 (view) |
Author: ganesh |
Date: 2014-12-08.19:20:21 |
|
I've spent some time looking at this and sent draft patch1239
I don't really like it as it's a layer of sticking plaster on top of an already messy
situation. It also doesn't make the situation any better on Windows or in non-UTF8
locales, though I don't think it makes it any worse.
The patch basically adds a new parameter to all the printing code to decide whether to
locale-encode Strings as they are output or not. Bytestrings are left alone as they
typically seem to be sourced from patch metadata which is already UTF8-encoded.
I don't think we can do this translation unconditionally because the same code is used
to print out patch files on disk etc. It seems to me that the right long-term solution
would be to use phantom types to tag both Strings and Bytestrings with their encoding
(e.g. locale, UTF8, 7bit-only). This would make it clear where in the code we should
convert things. We should also cleanly split out the "print for user consumption" and
"print for persistence" use cases; they're currently tangled together all the way up
to classes like ShowPatch.
Users will also need to set the environment variable DARCS_DONT_ESCAPE_8BIT=1 to get
UTF8 output. It feels like we should make that the default, but I don't fully
understand the implications and why it isn't already the default.
|
msg18016 (view) |
Author: bfrk |
Date: 2015-02-05.14:49:53 |
|
Patch1239 was a first attempt at bringing this closer to a conclusion.
It makes it apparent (and easily changeable) when and where a possible
encoding step gets inserted.
However, the question remains, how do we know in general when to pass
Encode and when to pass Standard? Deciding this requires non-local
knowledge about what kind of data we are processing which is error prone.
My gut feeling is that the *producer* should put that information into
the Doc, rather than the consumer guessing it. If I am right with that,
the distinction can and should be made apparent in the type of the data,
for instance by adding a type parameter
data Doc (enc :: RenderMode) = ...
Here, RenderMode is promoted to a kind. This needs the DataKinds
extension which is available since ghc-7.6.x.
I have moved the milestone to 3.0.0 because I do not think we have a
definite solution yet.
|
msg18722 (view) |
Author: xancorreu |
Date: 2015-08-11.13:54:06 |
|
It works with `export DARCS_DONT_ESCAPE_8BIT=1`.
Perhaps we could display *always* UTF-8 as UTF-8 except otherwise is
noted (perhaps putting some file in _darcs directory for explicit use of
encodings other than UTF-8).
|
|
Date |
User |
Action |
Args |
2009-11-15 18:08:14 | tux_rocker | create | |
2010-11-07 16:01:12 | tux_rocker | link | issue1990 superseder |
2010-11-07 16:06:42 | tux_rocker | set | assignedto: tux_rocker |
2010-11-19 10:13:46 | kowey | set | messages:
+ msg13112 milestone: 2.5.0 |
2010-11-21 16:24:25 | tux_rocker | set | messages:
+ msg13156 |
2010-11-21 17:17:52 | mornfall | set | messages:
+ msg13160 |
2010-12-10 16:32:16 | kowey | set | milestone: 2.5.0 -> 2.5.1 |
2010-12-16 16:44:15 | kowey | set | messages:
+ msg13356 |
2010-12-17 10:11:21 | kowey | set | messages:
+ msg13373 |
2011-01-05 23:16:16 | ganesh | set | messages:
+ msg13466 milestone: 2.5.1 -> 2.5.2 |
2011-01-12 14:05:58 | asr | set | nosy:
+ asr |
2011-06-23 16:25:03 | nad | set | nosy:
+ nad messages:
+ msg14549 |
2013-06-21 13:34:11 | rpglover64 | set | nosy:
+ rpglover64 |
2013-09-25 18:47:32 | rpglover64 | set | messages:
+ msg17050 |
2014-11-26 06:34:21 | ganesh | set | status: needs-reproduction -> needs-diagnosis/design nosy:
+ ganesh messages:
+ msg17859 assignedto: tux_rocker -> ganesh milestone: 2.5.2 -> 2.10.0 superseder:
- Should store patch metadata in utf-8 |
2014-12-08 19:14:59 | ganesh | link | patch1239 issues |
2014-12-08 19:20:23 | ganesh | set | messages:
+ msg17884 |
2015-02-05 14:49:56 | bfrk | set | messages:
+ msg18016 milestone: 2.10.0 -> 3.0.0 |
2015-08-10 06:07:18 | xancorreu | set | nosy:
+ xancorreu |
2015-08-11 13:54:08 | xancorreu | set | messages:
+ msg18722 |
|