darcs

Patch 167 Reintroduce UTF-8 tagging (and 3 more)

Title Reintroduce UTF-8 tagging (and 3 more)
Superseder Nosy List WorldMaker, dagit, darcs-users, kowey, mornfall, tux_rocker
Related Issues
Status accepted Assigned To tux_rocker
Milestone

Created on 2010-02-21.16:06:37 by tux_rocker, last changed 2011-05-10.22:36:46 by darcswatch. Tracked on DarcsWatch.

Files
File name Status Uploaded Type Edit Remove
reintroduce-utf_8-tagging.dpatch tux_rocker, 2010-02-21.16:06:36 text/x-darcs-patch
reintroduce-utf_8-tagging.dpatch tux_rocker, 2010-03-21.13:36:14 text/x-darcs-patch
resolve-issue64_-store-metadata-as-utf_8_-autodetect-utf_8_-and-don_t-normalize-to-nfc.dpatch tux_rocker, 2010-05-02.12:52:11 text/x-darcs-patch
unnamed tux_rocker, 2010-02-21.16:06:36 text/plain
unnamed dagit, 2010-03-18.02:36:50 text/html
unnamed tux_rocker, 2010-03-21.13:36:14 text/plain
unnamed tux_rocker, 2010-05-02.12:52:11
See mailing list archives for discussion on individual patches.
Messages
msg10059 (view) Author: tux_rocker Date: 2010-02-21.16:06:36
This fixes some things in the UTF-8 metadata that's in the HEAD already. It
reintroduces the 'Ignore-this: UTF-8' line to be able to distinguish
normalized and denormalized strings, and it fixes the crash bug reported as
issue1739.

The issue1739 fix makes ColorPrinter escape characters greater than 127 as
"<U+XXXX>" instead of "\xx". I borrowed that notation from less (the pager).
The "U+XXXX" part is an accepted way of referring to Unicode code points.

What remains to do is:
  * remove the stopgap home-grown ICU binding that was intended for the 2.4
    release and replace it by text-icu
  * find a way of using ICU without bundling it or making installing darcs
    disproportionally difficult on Windows

2 patches for repository http://darcs.net/:

Sun Feb  7 18:36:56 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Reintroduce UTF-8 tagging

Sun Feb 21 16:56:14 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
  * resolve issue1739: make ColorPrinter handle characters > 255
Attachments
msg10245 (view) Author: dagit Date: 2010-03-18.02:36:50
Any volunteers to review this?

Jason

On Sun, Feb 21, 2010 at 9:06 AM, Reinier Lamers <bugs@darcs.net> wrote:

>
> New submission from Reinier Lamers <tux_rocker@reinier.de>:
>
> This fixes some things in the UTF-8 metadata that's in the HEAD already. It
> reintroduces the 'Ignore-this: UTF-8' line to be able to distinguish
> normalized and denormalized strings, and it fixes the crash bug reported as
> issue1739.
>
> The issue1739 fix makes ColorPrinter escape characters greater than 127 as
> "<U+XXXX>" instead of "\xx". I borrowed that notation from less (the
> pager).
> The "U+XXXX" part is an accepted way of referring to Unicode code points.
>
> What remains to do is:
>  * remove the stopgap home-grown ICU binding that was intended for the 2.4
>    release and replace it by text-icu
>  * find a way of using ICU without bundling it or making installing darcs
>    disproportionally difficult on Windows
>
> 2 patches for repository http://darcs.net/:
>
> Sun Feb  7 18:36:56 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
>  * Reintroduce UTF-8 tagging
>
> Sun Feb 21 16:56:14 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
>  * resolve issue1739: make ColorPrinter handle characters > 255
>
> ----------
> files: reintroduce-utf_8-tagging.dpatch, unnamed
> messages: 10059
> nosy: darcs-users, tux_rocker
> status: needs-review
> title: Reintroduce UTF-8 tagging (and 1 more)
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/patch167>
> __________________________________
> _______________________________________________
> darcs-users mailing list
> darcs-users@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-users
>
>
Attachments
msg10265 (view) Author: mornfall Date: 2010-03-18.08:14:25
Jason Dagit <dagit@codersbase.com> writes:
>     This fixes some things in the UTF-8 metadata that's in the HEAD already. It
>     reintroduces the 'Ignore-this: UTF-8' line to be able to distinguish
>     normalized and denormalized strings, and it fixes the crash bug reported as
>     issue1739.
I am still opposed to Ignore-This hacks.


>     Sun Feb 21 16:56:14 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
>      * resolve issue1739: make ColorPrinter handle characters > 255

>     The issue1739 fix makes ColorPrinter escape characters greater than 127 as
>     "<U+XXXX>" instead of "\xx". I borrowed that notation from less (the
>     pager).
>     The "U+XXXX" part is an accepted way of referring to Unicode code points.
As for this part of the patch, it looks ok and could be pushed, as far
as I can tell. Does not seem to depend on the former.

Yours,
   Petr.
msg10284 (view) Author: tux_rocker Date: 2010-03-19.13:44:50
Hi Petr,

Op donderdag 18 maart 2010 09:14 schreef je:
> Petr Ročkai <me@mornfall.net> added the comment:
> 
> Jason Dagit <dagit@codersbase.com> writes:
> >     This fixes some things in the UTF-8 metadata that's in the HEAD 
already. It
> >     reintroduces the 'Ignore-this: UTF-8' line to be able to distinguish
> >     normalized and denormalized strings, and it fixes the crash bug 
reported as
> >     issue1739.
> I am still opposed to Ignore-This hacks.

Is there an alternative? And in what way is it a hack? The "Ignore-this:" 
mechanism is intended to give information about a patch to the processing 
darcs, isn't it?

Reinier
msg10301 (view) Author: WorldMaker Date: 2010-03-19.17:35:55
Reinier Lamers wrote:
>> I am still opposed to Ignore-This hacks.
> 
> Is there an alternative? And in what way is it a hack? The "Ignore-this:" 
> mechanism is intended to give information about a patch to the processing 
> darcs, isn't it?

It's been discussed before that patches should have a real, reliably 
extensible metadata format, rather than the quick hacks that are the 
Ignore-This lines, which look stupid when you look at the real patch 
comment and are very obviously a hack anytime you see a patch in the wild.

Of course, that discussion gets derailed in the debate between coming up 
with a new, better patch format in general or using the existing long 
comments ala Ignore-This, but perhaps with a more, strictly standardized 
MIME-like or email header-like approach. Of course any such approach 
will break compatibility with one or more tools...

--
--Max Battcher--
http://worldmaker.net
msg10366 (view) Author: tux_rocker Date: 2010-03-21.13:36:14
I think this brings the UTF-8 metadata stuff in a state that can be in darcs
2.5.

It stops requiring ICU. Normalization is left to the future Unicode-aware
patch matching code.

Also, darcs no longer crashes on metadata that has the
'Ignore-this: UTF-8' line but is not UTF-8. Instead, it puts in the U+FFFD
replacement character. If you see that on your screen, it's a sign that
something is wrong, but at least it still allows you to unpull your broken
patch.

4 patches for repository http://darcs.net/:

Sun Feb  7 18:36:56 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Reintroduce UTF-8 tagging

Sun Feb 21 16:56:14 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
  * resolve issue1739: make ColorPrinter handle characters > 255

Sun Mar 21 14:20:13 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Give up on storing metadata normalized to NFC

Sun Mar 21 14:22:30 CET 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Don't crash on patch metadata that falsely claims it's UTF-8
Attachments
msg10428 (view) Author: kowey Date: 2010-03-23.01:04:01
Three points:

- I've opened issue1787 in light of Max's comments.
- Whatever mechanism we need must be backwards compatible
- Petr's objections (if I understand correctly) are about tagging the
patches at all (and not the syntax we use for tagging).

So I think we should not block this patch on the conventional
considerations (Ignore-this: vs some other tagging scheme).

Reinier: could you comment on Petr's reply to the high-level feedback?
http://lists.osuosl.org/pipermail/darcs-users/2009-December/022640.html

(Perhaps off-tracker if it's going to open a discussion)

Meanwhile, I'll try to push the issue1739 fix
msg10441 (view) Author: mornfall Date: 2010-03-23.02:54:24
Eric Kow <bugs@darcs.net> writes:
> - Petr's objections (if I understand correctly) are about tagging the
> patches at all (and not the syntax we use for tagging).
Well, actually both. I dislike abusing ignore-this, and in this case I
think it is completely redundant as well. I.e. I still think that
anything that decodes as valid utf8 should be treated as utf8 and the
rest as locale-encoded (and if the locale happens to be utf8, output
error characters).

You have to do this anyway for untagged patches, and if you only write
utf8 patches in the future, these will obviously decode as utf8
correctly, ignore-this or not.

As for normalisation, I would defer this to match time. Solves both the
hard-ICU-dependency problem at a low cost and it avoids requiring extra
tagging. (In fact, it may be *more* efficient to do this at match
time. The only way to convince me of any performance benefits of
ignore-this tagging is to produce profiling data supporting that.)

So I don't think this leaves much in favour of ignore-this, does it?
(Please bring up anything I might have omitted.)

Yours,
   Petr.
msg10892 (view) Author: tux_rocker Date: 2010-05-02.12:52:11
This is another bundle that I hope will conclude the UTF-8 work. Most
importantly, it removes the normalization to NFC of patch metadata when they
are stored. So we no longer require ICU for building, which makes building on
Windows easier. As a consequence, when implementing Unicode-aware matching
(issue1692), the matching code will have to normalize.

4 patches for repository http://darcs.net/:

Sun May  2 14:12:06 CEST 2010  Reinier Lamers <tux_rocker@reinier.de>
  * resolve issue64: store metadata as UTF-8, autodetect UTF-8, and don't normalize to NFC

Sun May  2 14:13:56 CEST 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Document behavior of text decoding functions in case of malformed input

Sun May  2 14:15:20 CEST 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Add some haddock in Darcs.Patch.Info

Sun May  2 14:17:28 CEST 2010  Reinier Lamers <tux_rocker@reinier.de>
  * Rename metadataStringTest to metadataDecodingTest
Attachments
msg10919 (view) Author: darcswatch Date: 2010-05-04.07:06:07
This patch bundle (with 4 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-61798fe9363911b19cd601168da6001c38ac9ce8
msg14439 (view) Author: darcswatch Date: 2011-05-10.22:36:46
This patch bundle (with 4 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-61798fe9363911b19cd601168da6001c38ac9ce8
History
Date User Action Args
2010-02-21 16:06:38tux_rockercreate
2010-02-21 16:08:51darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-c11ab5ea38cadc4f67f65c28a882658840c05f12
2010-03-18 02:36:51dagitsetfiles: + unnamed
nosy: + dagit
messages: + msg10245
2010-03-18 08:14:25mornfallsetnosy: + mornfall
messages: + msg10265
2010-03-19 13:44:52tux_rockersetmessages: + msg10284
2010-03-19 17:35:57WorldMakersetnosy: + WorldMaker
messages: + msg10301
2010-03-21 13:36:15tux_rockersetfiles: + reintroduce-utf_8-tagging.dpatch, unnamed
messages: + msg10366
title: Reintroduce UTF-8 tagging (and 1 more) -> Reintroduce UTF-8 tagging (and 3 more)
2010-03-21 13:37:39darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-c11ab5ea38cadc4f67f65c28a882658840c05f12 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-bcda5aef02bf6be0729c517d0ef006f5d3e2d2a3
2010-03-23 01:04:04koweysetstatus: needs-review -> review-in-progress
nosy: + kowey
messages: + msg10428
assignedto: tux_rocker
2010-03-23 02:54:25mornfallsetmessages: + msg10441
2010-03-26 14:18:38koweysetassignedto: tux_rocker -> (no value)
2010-04-01 12:40:00koweysetstatus: review-in-progress -> followup-in-progress
assignedto: tux_rocker
2010-05-02 12:52:11tux_rockersetfiles: + resolve-issue64_-store-metadata-as-utf_8_-autodetect-utf_8_-and-don_t-normalize-to-nfc.dpatch, unnamed
messages: + msg10892
2010-05-02 12:53:24darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-bcda5aef02bf6be0729c517d0ef006f5d3e2d2a3 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-61798fe9363911b19cd601168da6001c38ac9ce8
2010-05-04 07:06:07darcswatchsetstatus: followup-in-progress -> accepted
messages: + msg10919
2011-05-10 20:05:55darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-61798fe9363911b19cd601168da6001c38ac9ce8 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-bcda5aef02bf6be0729c517d0ef006f5d3e2d2a3
2011-05-10 22:07:59darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-bcda5aef02bf6be0729c517d0ef006f5d3e2d2a3 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-c11ab5ea38cadc4f67f65c28a882658840c05f12
2011-05-10 22:36:46darcswatchsetmessages: + msg14439