Created on 2009-11-01.17:55:50 by tux_rocker, last changed 2009-12-24.15:49:56 by tux_rocker.
See mailing list archives
for discussion on individual patches.
msg9163 (view) |
Author: tux_rocker |
Date: 2009-11-01.17:55:44 |
|
Hi all,
Here is the patch that makes darcs store patch name, author and log as UTF-8 encoded strings.
By itself, this has no visible advantage. However, it makes it possible to:
* display non-ASCII patch and author names consistently in different locales
* produce closer-to-valid XML if the --xml-output option is given
* use --match with non-ASCII patch and author names on patches that were
recorded in another locale
The patches makes darcs differentiate between old-style and UTF-8 patches by
adding an 'Ignore-this: UTF-8' line to the patch log, and looking for that
line when interpreting the patch.
The patches make darcs treat old-style patches as locale-encoded. This may be
controversial. I did this because I suppose that people have already been
using darcs with non-ASCII locales, using its encoding ignorance as a feature.
They may have recorded patches with metadata in UTF-8 or other multibyte
encodings, and it would be sorry for them to start displaying such metadata
interpreted as ASCII or latin-1.
I attached a diff between the starting point and the end result of these
patches, as they contain some hunks that are reverted in later patches.
Regards,
Reinier
Attachments
|
msg9178 (view) |
Author: twb |
Date: 2009-11-02.04:38:04 |
|
John Cowan is a Unicode guru, and I asked him to take a quick look at
this patch. Here are his comments:
twb> Darcs is a VCS implemented in Haskell. There is a patch to make
twb> it understand (in some places) Unicode, instead of just treating
twb> everything as sequences of bytes. Can you recommend someone to
twb> review this patch from a "knows about Unicode" perspective?
jcowan> I'll look at it myself. Though I'm no Haskell maven, so I may
jcowan> have questions. [...] I'm looking at the .diff. I can sort
jcowan> of imagine the original. Hmm, the XML escaping looks
jcowan> problematic, unless it's something about laziness I don't
jcowan> understand. You need to convert & to & first before the
jcowan> others are done. Slavoj Žižek! Cool! [...] Except for the
jcowan> XML escaping, all looks fine.
I've also attached a copy of the bundle in unified format, for other
reviewers who aren't familiar with Darcs' internal format (.dpatch)
and don't like context diffs :-)
Attachments
|
msg9182 (view) |
Author: tux_rocker |
Date: 2009-11-02.09:05:56 |
|
It looks like John Cowan was reading the '.' operator the wrong way. The replace
of '&' by '"' is the last line of the function, but it is the first
operation that takes place.
|
msg9229 (view) |
Author: tux_rocker |
Date: 2009-11-09.22:11:24 |
|
Here's the bundle again. This serves two purposes:
(1) Hopefully the web interface will not corrupt the bundle (it contains lots of
funny characters that may get mutilated by eager MUAs and MTAs)
(2) It resolves two conflicts with recent developments
Attachments
|
msg9307 (view) |
Author: tux_rocker |
Date: 2009-11-15.15:57:34 |
|
Here is the same patch bundle as utf8-bundle-amended.dpatch, but now with
another patch that adds UTF-8 autodetection.
This can not go in until a new version of haskeline is released, which requires
a new version of utf8-string, that fixes a bug that is triggered by the unit
tests that this bundle does.
I am slightly worried that UTF-8 autodetection *may* give false positives
because we use such short strings to do our autodetection on. OTOH, we assume
that encodings are ASCII-compatible anyway, so sending ASCII through
unpackPSFromUTF8 instead of decodeLocale should do no harm.
|
msg9308 (view) |
Author: tux_rocker |
Date: 2009-11-15.15:58:47 |
|
Here is the same patch bundle as utf8-bundle-amended.dpatch, but now with
another patch that adds UTF-8 autodetection.
This can not go in until a new version of haskeline is released, which requires
a new version of utf8-string, that fixes a bug that is triggered by the unit
tests that this bundle does.
I am slightly worried that UTF-8 autodetection *may* give false positives
because we use such short strings to do our autodetection on. OTOH, we assume
that encodings are ASCII-compatible anyway, so sending ASCII through
unpackPSFromUTF8 instead of decodeLocale should do no harm.
Attachments
|
msg9463 (view) |
Author: tux_rocker |
Date: 2009-11-22.18:00:50 |
|
Again, a new patch file that can be applied to current head. I'll add another
one that is one big patch that contains the same changes as the 22 patches in
this bundle.
Changes: conflict resolution, remove a compiler warning in Darcs/Utils.hs
Attachments
|
msg9469 (view) |
Author: kowey |
Date: 2009-11-23.15:37:13 |
|
This patch was from before darcs-users was automatically made nosy on new bugs,
so people might not have seen Reinier's amended version.
Note also that the new Haskeline has been released (thanks to Eric M and Judah
for their rapid response!) so we can apply this once the reviewer OKs the bundle.
I had a look at the first version of this, so to maximise eyeball application,
I'd like the second reviewer to be somebody else.
|
msg9550 (view) |
Author: kowey |
Date: 2009-12-06.18:04:16 |
|
I propose that Florent take on this bundle for review. Somebody else will have
to push it on his behalf. I suspect that this is bundle will be enough for us
to get Florent on the Review Team officially.
|
msg9646 (view) |
Author: galbolle |
Date: 2009-12-18.13:42:55 |
|
It looks like the decision to go with the intended behaviour of this patch has
already been taken, that is, we should store new meta-data as UTF-8 and
auto-detect encoding of meta-data by looking for incorrect UTF-8 sequences, and
assuming that the encoding is correct if we find none. I cannot comment on these
decisions, and I only reviewed the patch to check that they were implemented as
they should.
I think the code is correct, in the manypatches version. There is a conflict in
unit.hs which can be fixed by the attached patch.
|
|
Date |
User |
Action |
Args |
2009-11-01 17:55:50 | tux_rocker | create | |
2009-11-01 19:17:14 | kowey | set | issues:
+ Should store patch metadata in utf-8 |
2009-11-02 04:38:05 | twb | set | files:
+ utf8-metadata.unified.diff nosy:
+ twb messages:
+ msg9178 |
2009-11-02 09:05:57 | tux_rocker | set | messages:
+ msg9182 |
2009-11-09 22:11:25 | tux_rocker | set | files:
+ utf8-bundle-amended.dpatch messages:
+ msg9229 |
2009-11-15 15:57:36 | tux_rocker | set | messages:
+ msg9307 |
2009-11-15 15:58:49 | tux_rocker | set | files:
+ autodetect-utf8.dpatch messages:
+ msg9308 |
2009-11-22 18:00:51 | tux_rocker | set | files:
+ utf8-metadata-20091122-manypatches.dpatch messages:
+ msg9463 |
2009-11-22 18:06:15 | tux_rocker | set | files:
+ utf8-metadata-20091122-onebigpatch.dpatch |
2009-11-23 15:37:14 | kowey | set | nosy:
+ kowey, darcs-users messages:
+ msg9469 |
2009-12-06 18:04:19 | kowey | set | nosy:
+ galbolle messages:
+ msg9550 assignedto: galbolle |
2009-12-11 12:40:35 | galbolle | set | status: needs-review -> review-in-progress |
2009-12-18 13:42:57 | galbolle | set | status: review-in-progress -> accepted-pending-tests messages:
+ msg9646 |
2009-12-18 13:55:49 | galbolle | set | files:
+ conflict-fix.dpatch |
2009-12-24 15:49:56 | tux_rocker | set | status: accepted-pending-tests -> accepted |
|