Created on 2008-01-09.04:19:29 by ppessi, last changed 2009-08-27.14:08:00 by admin.
File name |
Uploaded |
Type |
Edit |
Remove |
list-files-utf8
|
ppessi,
2008-01-09.04:19:21
|
application/octet-stream |
|
|
msg2385 (view) |
Author: ppessi |
Date: 2008-01-09.04:19:21 |
|
When the repo contnets are shown with darcs list, the file names that
contain 8-bit chars (UTF-8 or ISO-8859-* or whatever) are converted to
UTF-8 as if they are ISO-8859-1.
For example, file named "Ääliö älä lyö ööliä läikkyy" in 8859-1 is
byte string \e4\e4\6c\69\f6 ...
It is shown with, e.g., darcs changes --summary as quoted bytestring
[_\e4_][_\e4_]li[_\f6_] ...
With darcs list files it is shown as
./[_\c3_][_\a4_][_\c3_][_\a4_]li[_\c3_][_\b6_] (iow, it has been
converted into utf-8 as iso-8859-1).
If the file name is encoded in utf-8, it has bytestring
\c3\84\c3\a4\6c\69\c3\b6 (each accented char is now encoded in two
bytes). It is shown with, e.g., darcs changes --summary as quoted
bytestring [_\c3_][_\84_][_\c3_][_\a4_]li[_\c3_][_\b6_]
However, with darcs list files it is shown as
[_\c3_][_\83_][_\c2_][_\84_][_\c3_][_\83_][_\c2_][_\a4_]li[_\c3_][_\83_][_\c2_][_\b6_]
that is, darcs list assumes that the bytestring is a ISO-8859-1 string
and converts it into UTF-8.
A script output from utf-8 terminal is attached.
Attachments
|
msg2413 (view) |
Author: droundy |
Date: 2008-01-10.21:04:06 |
|
On Wed, Jan 09, 2008 at 04:19:29AM -0000, Pekka Pessi wrote:
> When the repo contnets are shown with darcs list, the file names that
> contain 8-bit chars (UTF-8 or ISO-8859-* or whatever) are converted to
> UTF-8 as if they are ISO-8859-1.
I've just fixed (although it hasn't hit the central repo yet) this bug in
list files. Thanks for the report!
However, the similar bug in the output of whatsnew, etc, has not yet been
fixed. Many of these issues date from my very naive assumption (long ago!)
that because Char is a 32 bit value, it is therefore a unicode value. :(
It would be nice to fix this for darcs-2, but it's a bit awkward. At a
minimum, I think I could hack things together so that darcs' text output
doesn't have these faulty conversions. Changing the on-disk format is a
bit more awkward.
--
David Roundy
Department of Physics
Oregon State University
|
msg2417 (view) |
Author: droundy |
Date: 2008-01-10.22:55:04 |
|
I've just pushed code making the --darcs-2 format store and display
filenames as raw bytes, rather than trying to convert to utf-8. This means
anyone who already has a darcs repository in the experimental --darcs-2
format with non-ascii filenames will run into trouble. It also means that
I'd love to have some tests for this in the test suite, or at least some
testing by users who use non-ascii filenames. Thanks!
--
David Roundy
Department of Physics
Oregon State University
|
|
Date |
User |
Action |
Args |
2008-01-09 04:19:30 | ppessi | create | |
2008-01-10 21:04:09 | droundy | set | status: unread -> unknown nosy:
+ darcs-devel messages:
+ msg2413 |
2008-01-10 22:55:05 | droundy | set | messages:
+ msg2417 |
2008-01-10 23:45:14 | droundy | set | status: unknown -> resolved-in-unstable |
2008-09-04 21:31:51 | admin | set | status: resolved-in-unstable -> resolved nosy:
+ dagit |
2009-08-06 17:33:26 | admin | set | nosy:
+ markstos, jast, Serware, dmitry.kurochkin, zooko, mornfall, simon, thorkilnaur, - droundy, ppessi |
2009-08-06 20:30:54 | admin | set | nosy:
- beschmi |
2009-08-10 22:10:30 | admin | set | nosy:
+ ppessi, - markstos, zooko, jast, Serware, mornfall |
2009-08-11 00:04:23 | admin | set | nosy:
- dagit |
2009-08-25 17:48:10 | admin | set | nosy:
- simon |
2009-08-27 14:08:00 | admin | set | nosy:
tommy, kowey, darcs-devel, ppessi, thorkilnaur, dmitry.kurochkin |
|