Created on 2011-07-05.17:49:58 by kerneis, last changed 2011-12-30.10:11:09 by kerneis.
File name |
Status |
Uploaded |
Type |
Edit |
Remove |
fix-missing-langinfo_h-on-windows.dpatch
|
|
kerneis,
2011-12-01.09:01:38
|
application/x-darcs-patch |
|
|
fix-test-suite-for-new-makeemail-signature.dpatch
|
|
kerneis,
2011-11-29.15:01:01
|
application/x-darcs-patch |
|
|
patch-preview.txt
|
|
kerneis,
2011-11-29.15:01:01
|
text/x-darcs-patch |
|
|
patch-preview.txt
|
|
ganesh,
2011-11-30.19:40:07
|
text/x-darcs-patch |
|
|
patch-preview.txt
|
|
kerneis,
2011-12-01.09:01:38
|
text/x-darcs-patch |
|
|
unnamed
|
|
kerneis,
2011-11-29.15:01:01
|
|
|
|
unnamed
|
|
ganesh,
2011-11-30.19:40:07
|
|
|
|
unnamed
|
|
kerneis,
2011-12-01.09:01:38
|
|
|
|
update-text-dep-to-make-sure-we-have-decodeutf8_.dpatch
|
|
ganesh,
2011-11-30.19:40:07
|
application/x-darcs-patch |
|
|
use-utf_8-charset-for-darcs-send-in-case-of-non_ascii-characters.dpatch
|
dead |
kerneis,
2011-07-17.19:09:32
|
application/x-darcs-patch |
|
|
utility-functions-to-check-locale-encoding.dpatch
|
|
kerneis,
2011-08-29.15:44:35
|
application/x-darcs-patch |
|
|
See mailing list archives
for discussion on individual patches.
msg14574 (view) |
Author: kerneis |
Date: 2011-07-05.17:49:58 |
|
Hi,
this very simple patch fixes darcs send in order to handle non-ascii
characters in the message. Note this has nothing to do with the patch
bundle, nor with the patch preview. It's only about the message (like
the one you are reading currently).
Here is a demo: é è ề Ψ ޡ ߐ ह ჴ Ᏻ ‱ ⁂ ∰ ✈ ⢅ 𝀵
If the text uses non-ascii characters, I assume it is valid utf-8. It
will work on any reasonable modern system and it can’t be more broken
than the current behaviour anyway.
And please, don’t ask me to take the locale into account.
Best,
Gabriel
PS: I could write a similar patch for handling patch preview (I am of
course not talking about the actual bundle, only the text/x-darcs-patch
part). Since I’m not sure it the best option, I’ll first wait for this
one to be accepted.
1 patch for repository http://darcs.net/screened:
Tue Jul 5 15:43:31 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Use utf-8 charset for darcs send in case of non-ascii characters
Attachments
|
msg14575 (view) |
Author: kerneis |
Date: 2011-07-08.15:50:42 |
|
Hi,
this follow-up patch should fix the main concern about warning non-UTF-8
users.
It might definitely be improved; in particular, I check twice if the
message contains ASCII characters, once in Send.hs and once in Email.hs.
I also wonder if locales are useful at all --- see comment in the fourth
patch about that, maybe detecting invalid UTF-8 encoding is enough (and
certainly safer). But I'd like to get some feedback before I hack yet
another version of this patch.
Also, would it be okay to depend on the utf8-string package, to use the
Codec.Binary.UTF8.String.isUTF8Encoded function?
http://hackage.haskell.org/packages/archive/utf8-string/
Please let me know what you think about it.
Best,
Gabriel
4 patches for repository http://darcs.net/screened:
Tue Jul 5 15:43:31 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Use utf-8 charset for darcs send in case of non-ascii characters
Fri Jul 8 17:06:44 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Add functions to get and check locale encoding in Darcs.Utils
Fri Jul 8 17:11:49 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Avoid useless re-rendering of contents message
Fri Jul 8 17:32:13 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Propose to abort before sending mail if current locale is not UTF-8
Attachments
|
msg14576 (view) |
Author: kowey |
Date: 2011-07-08.16:45:24 |
|
Hi,
On Fri, Jul 08, 2011 at 15:50:43 +0000, Gabriel Kerneis wrote:
> It might definitely be improved; in particular, I check twice if the
> message contains ASCII characters, once in Send.hs and once in Email.hs.
Checking does seem to be one good way to be considerate in the sense
that we only bother the user if we really have to.
But I was wondering: for patch metadata IO, Darcs already has the
auto-detection habit of trying UTF-8 first and falling back to
locale-specific decoding (see Darcs.Patch.Info).
Could we not do the same for reading in the messages for darcs send
emails? In other words, try to emit utf-8 systematically; but do not
necessarily require that the user be in a UTF-8 locale.
As for the code, I think we already use Data.Text.Encoding from the text
package in Darcs (see ByteStringUtils)
Thanks,
Eric
PS. I don't think I've mentioned this on list yet, but I've been on a
sort of Darcs holiday while I attend to some personal matters.
Hoping to be back in September.
--
Eric Kow <http://erickow.com>
|
msg14589 (view) |
Author: kerneis |
Date: 2011-07-13.13:21:21 |
|
It is probably clear from the discussion on darcs-users, but just in
case: please do NOT apply this patch to screened, I'll rework it and
send an improved version for 2.10 (it won't make it into 2.8 anyway).
|
msg14593 (view) |
Author: kerneis |
Date: 2011-07-17.19:09:32 |
|
Hi,
this is a third attempt at using utf-8 whenever possible in darcs send.
Here is how it works:
- if mail is made of ascii characters, send with content-type charset
set to ascii and ignore locale completely,
- if mail contains invalid utf-8 characters, propose to either abort
(and save mail content in a file) or ignore the error and send with
content-type charset set to utf-8 anyway (no support for other
charset when sending),
- if mail is valid utf-8, send with content-type charset set to utf-8;
additionnally print a warning if current locale is not recognised as
utf-8 (but do not propose to abort, assuming what looks like utf-8 is
utf-8).
The warnings are respectively:
**********
Warning: darcs expects an UTF-8 encoding when sending emails.
Your text is not valid UTF-8. Send anyway? [yn]
*** and ***
Warning: sending email as UTF-8 regardless of your locale (since it seems well-formed).
**********
Please, do not hesitate to rephrase them (Darcs.Commands.Send), I am not
a native speaker. Apart from this wording issue, I tested the patch
rather extensively and I think it is ready to get into screened.
Best,
Gabriel
7 patches for repository http://darcs.net/screened:
Tue Jul 5 15:43:31 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Use utf-8 charset for darcs send in case of non-ascii characters
Fri Jul 8 17:06:44 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Add functions to get and check locale encoding in Darcs.Utils
Fri Jul 8 17:11:49 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Avoid useless re-rendering of contents message
Fri Jul 8 17:32:13 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Propose to abort before sending mail if current locale is not UTF-8
Sun Jul 17 20:35:03 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Add ByteString.isAscii
Sun Jul 17 20:35:52 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Use ByteString.isAscii in Darcs.Email
Sun Jul 17 20:36:33 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Improve UI for non-ascii messages in darcs send
Attachments
|
msg14595 (view) |
Author: jch |
Date: 2011-07-19.21:14:00 |
|
> - if mail contains invalid utf-8 characters, propose to either abort
> (and save mail content in a file) or ignore the error and send with
> content-type charset set to utf-8 anyway (no support for other
> charset when sending),
Shouldn't that be ``text/plain; charset=x-unknown''?
-- Juliusz
|
msg14596 (view) |
Author: kerneis |
Date: 2011-07-20.05:32:18 |
|
On Tue, Jul 19, 2011 at 11:03:19PM +0200, Juliusz Chroboczek wrote:
> > - if mail contains invalid utf-8 characters, propose to either abort
> > (and save mail content in a file) or ignore the error and send with
> > content-type charset set to utf-8 anyway (no support for other
> > charset when sending),
>
> Shouldn't that be ``text/plain; charset=x-unknown''?
Yeah. Or charset=unknown-8bit. Google says both are accepted by most clients.
--
Gabriel
|
msg14630 (view) |
Author: ganesh |
Date: 2011-08-13.11:37:25 |
|
did you forget to darcs add system_encoding.c?
Also the existing convention for .c files seems to be to be to put them
in src, not src/Darcs - since none of the ones we have is really Darcs
specific and I guess that'll also be true of system_encoding.c
|
msg14698 (view) |
Author: kerneis |
Date: 2011-08-29.15:34:27 |
|
On Sat, Aug 13, 2011 at 11:37:26AM +0000, Ganesh Sittampalam wrote:
> did you forget to darcs add system_encoding.c?
>
> Also the existing convention for .c files seems to be to be to put them
> in src, not src/Darcs - since none of the ones we have is really Darcs
> specific and I guess that'll also be true of system_encoding.c
Agreed. I'll send an updated patch.
--
Gabriel
|
msg14699 (view) |
Author: kerneis |
Date: 2011-08-29.15:44:35 |
|
Hi,
here is the latest (and hopefully the last) version of my patch to fix
charset in darcs send.
It adds a --charset option to let the user specify any charset.
Additionnally, it tries to guess ASCII (easy) and UTF-8 (and bit harder)
charsets. It also checks locale in case of UTF-8 to make sure there is
no discrepency, and asks before making bold assumptions.
I have tested it and it works for me but making unit tests seems a bit
hard (you would have to intercept the mail somehow and check headers).
Note that this is a fresh set of patches, the previous ones should be
discarded.
Best,
Gabriel
4 patches for repository http://darcs.net/screened:
Mon Aug 29 16:48:18 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Utility functions to check locale encoding
Mon Aug 29 16:50:46 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Add ByteStringUtils.isAscii
Mon Aug 29 17:09:29 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Add a --charset flag for darcs send
Mon Aug 29 17:16:04 CEST 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Guess ascii and utf-8 charsets
In case we cannot guess the charset, or when it is utf-8 but the locale
differs, ask the user before sending.
Attachments
|
msg14827 (view) |
Author: kerneis |
Date: 2011-11-29.15:01:01 |
|
Hi,
one trivial follow-up patch to fix test suite building.
-- Gabriel
1 patch for repository http://darcs.net/screened:
Tue Nov 29 15:43:04 CET 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Fix test suite for new makeEmail signature
Attachments
|
msg14828 (view) |
Author: ganesh |
Date: 2011-11-30.19:40:07 |
|
1 patch for repository http://darcs.net/screened:
Wed Nov 30 18:06:17 GMT 2011 ganesh@earth.li
* update text dep to make sure we have decodeUtf8'
Attachments
|
msg14830 (view) |
Author: kerneis |
Date: 2011-12-01.09:01:38 |
|
1 patch for repository http://darcs.net/screened:
Thu Dec 1 09:55:39 CET 2011 Gabriel Kerneis <kerneis@pps.jussieu.fr>
* Fix missing langinfo.h on Windows
get_system_encoding will always return "utf8" on Windows, because darcs
only uses it for a sanity check which is useless in most cases.
Attachments
|
msg14909 (view) |
Author: ganesh |
Date: 2011-12-29.20:46:45 |
|
Looks fine to me, though unicode always scares me!
I tried sending an patch with the example text with and without these
patches and the difference was clear in my email client.
I also wrote a basic test for the utf-8 header and will push that along
with these patches.
|
msg14914 (view) |
Author: kerneis |
Date: 2011-12-30.10:11:08 |
|
On Thu, Dec 29, 2011 at 08:46:46PM +0000, Ganesh Sittampalam wrote:
> Looks fine to me, though unicode always scares me!
In fact, I think the locale check could be stricter (issuing a warning if the
locale is ASCII instead of UTF-8). I'll send a follow-up patch for that
purpose next week, but it is safe to apply as-is in the meantime this case
should almost never happen anyway.
Best regards,
--
Gabriel
|
|
Date |
User |
Action |
Args |
2011-07-05 17:49:58 | kerneis | create | |
2011-07-08 15:50:43 | kerneis | set | files:
+ unnamed, use-utf_8-charset-for-darcs-send-in-case-of-non_ascii-characters.dpatch, unnamed messages:
+ msg14575 |
2011-07-08 16:45:25 | kowey | set | messages:
+ msg14576 |
2011-07-13 13:21:21 | kerneis | set | status: needs-screening -> in-discussion messages:
+ msg14589 |
2011-07-17 19:08:37 | kerneis | set | status: in-discussion -> needs-screening |
2011-07-17 19:09:33 | kerneis | set | files:
+ unnamed, use-utf_8-charset-for-darcs-send-in-case-of-non_ascii-characters.dpatch, unnamed messages:
+ msg14593 title: Use utf-8 charset for darcs send in case of non-ascii ... -> Utf-8 encoding for darcs send (reloaded) |
2011-07-17 19:13:03 | kerneis | set | files:
- unnamed |
2011-07-17 19:13:05 | kerneis | set | files:
- unnamed |
2011-07-17 19:13:07 | kerneis | set | files:
- unnamed |
2011-07-17 19:13:09 | kerneis | set | files:
- unnamed |
2011-07-17 19:13:11 | kerneis | set | files:
- unnamed |
2011-07-17 19:13:13 | kerneis | set | files:
- unnamed |
2011-07-17 19:13:23 | kerneis | set | files:
- use-utf_8-charset-for-darcs-send-in-case-of-non_ascii-characters.dpatch |
2011-07-17 19:13:26 | kerneis | set | files:
- use-utf_8-charset-for-darcs-send-in-case-of-non_ascii-characters.dpatch |
2011-07-19 21:14:00 | jch | set | messages:
+ msg14595 |
2011-07-20 05:32:19 | kerneis | set | messages:
+ msg14596 |
2011-08-13 11:37:26 | ganesh | set | status: needs-screening -> followup-requested assignedto: kerneis messages:
+ msg14630 |
2011-08-29 15:34:28 | kerneis | set | messages:
+ msg14698 |
2011-08-29 15:44:36 | kerneis | set | files:
+ unnamed, utility-functions-to-check-locale-encoding.dpatch, unnamed messages:
+ msg14699 title: Utf-8 encoding for darcs send (reloaded) -> Correct charset in darcs send |
2011-08-29 15:45:04 | kerneis | set | status: followup-requested -> needs-screening |
2011-08-29 15:45:57 | kerneis | set | files:
- unnamed |
2011-08-29 15:46:16 | kerneis | set | files:
- unnamed |
2011-11-29 15:01:01 | kerneis | set | files:
+ patch-preview.txt, fix-test-suite-for-new-makeemail-signature.dpatch, unnamed messages:
+ msg14827 title: Correct charset in darcs send -> Follow-up |
2011-11-30 19:40:07 | ganesh | set | files:
+ patch-preview.txt, update-text-dep-to-make-sure-we-have-decodeutf8_.dpatch, unnamed messages:
+ msg14828 |
2011-12-01 09:01:38 | kerneis | set | files:
+ patch-preview.txt, fix-missing-langinfo_h-on-windows.dpatch, unnamed messages:
+ msg14830 title: Follow-up -> Follow-up (fix windows build) |
2011-12-16 09:34:06 | ganesh | set | status: needs-screening -> needs-review |
2011-12-29 20:46:45 | ganesh | set | status: needs-review -> accepted-pending-tests assignedto: kerneis -> ganesh title: Follow-up (fix windows build) -> Utf-8 encoding for darcs send messages:
+ msg14909 nosy:
+ ganesh |
2011-12-29 21:36:22 | ganesh | set | status: accepted-pending-tests -> accepted |
2011-12-30 10:11:09 | kerneis | set | messages:
+ msg14914 |
|