darcs

Issue 267 wish: Support UTF-16 text files

Title wish: Support UTF-16 text files
Priority wishlist Status wont-fix
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, stephen_gryphon, thorkilnaur, tommy, tuomov
Assigned To
Topics

Created on 2006-09-17.12:52:47 by stephen_gryphon, last changed 2009-10-24.00:41:07 by admin.

Files
File name Uploaded Type Edit Remove
stephen_gryphon.vcf stephen_gryphon, 2006-09-17.12:52:40 text/x-vcard
Messages
msg992 (view) Author: stephen_gryphon Date: 2006-09-17.12:52:41
I think it would be good for darcs to include better support for Unicode.

UTF-8 seems to work okay, mostly because it was designed in a clever way 
so as to be largely compatible with ASCII, or at least any system that 
supports an arbitrary high-byte range, for example, end of line markers 
are the same. (And the manual also has an environment setting you can 
use to get correct output.)

It would be nice to also have some minimal support for UTF-16, at the 
very least a way to override the automatic treatment as binary for files 
containing \0 (basic ASCII characters include \0, for example 'A' is 00 
41). I think there is already an item on the list to allow this 
automatic treatment to be overridden, which goes part of the way.

The other main issue would probably be detecting end of line for diffs. 
In some cases this may actually work by coincidence, with the last 0A 
(or 0D) being recognised, despite the proceeding 00 (depending on byte 
order). This isn't a complete solution, however, so proper end of line 
detection is needed.

Apart from detecting end of line, the rest of the system should be 
adequate -- once lines are correctly identified, a diff is any byte, 
whether or not a Unicode sequence, still makes them different. More 
advanced stuff, like identifying the start/end character of the 
difference within the line, or white space ignore, would require more work.

- Sly
Attachments
msg999 (view) Author: kowey Date: 2006-09-18.20:21:09
Tuomo has made some comments on the darcs-devel thread:

http://www.abridgegame.org/pipermail/darcs-devel/2006-September/004775.html

No comment myself (yet)
msg4073 (view) Author: droundy Date: 2008-03-28.18:52:39
We won't add support for utf-16 to darcs' hunk patches.  Text patches to a
utf-16 file would need to be a new patch type.  I wouldn't object to someone
writing such a patch type, but I doubt it'll happen, so I'm marking this wish as
wont-fix.

David
History
Date User Action Args
2006-09-17 12:52:47stephen_gryphoncreate
2006-09-18 20:21:15koweysetstatus: unread -> unknown
nosy: + tuomov1
messages: + msg999
title: Wishlist item for darcs -> Support UTF-16 text files
2008-02-07 05:10:16markstossetstatus: unknown -> deferred
nosy: + beschmi
title: Support UTF-16 text files -> wish: Support UTF-16 text files
2008-03-28 18:52:40droundysetnosy: droundy, tommy, beschmi, kowey, tuomov1, stephen_gryphon
messages: + msg4073
2008-05-07 15:30:12droundysetstatus: deferred -> wont-fix
nosy: + dagit
2009-08-06 17:51:31adminsetnosy: + markstos, jast, Serware, dmitry.kurochkin, darcs-devel, zooko, mornfall, simon, thorkilnaur, - droundy, tuomov1, stephen_gryphon
2009-08-06 20:51:57adminsetnosy: - beschmi
2009-08-10 21:55:39adminsetnosy: + tuomov1, stephen_gryphon, - markstos, darcs-devel, zooko, jast, Serware, mornfall
2009-08-10 23:56:10adminsetnosy: - dagit
2009-08-25 18:02:54adminsetnosy: + darcs-devel, - simon
2009-08-27 13:58:28adminsetnosy: tommy, kowey, darcs-devel, tuomov1, stephen_gryphon, thorkilnaur, dmitry.kurochkin
2009-10-23 22:47:28adminsetnosy: + tuomov12345, - tuomov1
2009-10-24 00:12:09adminsetnosy: + tuomov1, - tuomov12345
2009-10-24 00:41:07adminsetnosy: + tuomov, - tuomov1