Created on 2005-11-30.09:06:11 by bortzmeyer, last changed 2017-07-30.23:58:15 by gh.
msg121 (view) |
Author: bortzmeyer |
Date: 2005-11-30.09:06:11 |
|
darcs --xml has the following limitations:
1) There is no way to get the list of files (--verbose seems ignored)
2) The dates (in <date>) are in a proprietary format. IMHO, they should be in
W3C Schema xsd:date or in ISO 8601 ("2005-11-30T10:03:06+0100") or RFC 3339 (a
subset of ISO 8601)
|
msg130 (view) |
Author: droundy |
Date: 2005-11-30.13:42:21 |
|
On Wed, Nov 30, 2005 at 09:06:11AM +0000, Bortzmeyer wrote:
> darcs --xml has the following limitations:
>From context, it's clear that you mean darcs changes --xml
> 1) There is no way to get the list of files (--verbose seems ignored)
You can get this with darcs annotate --xml. Adding --summary or --verbose
would be doable, but I'm downgrading this to wishlist, since the
functionality is already there.
> 2) The dates (in <date>) are in a proprietary format. IMHO, they should
> be in W3C Schema xsd:date or in ISO 8601 ("2005-11-30T10:03:06+0100") or
> RFC 3339 (a subset of ISO 8601)
You just need to add some dashes, a T and some colons and a +000 to get ISO
8601. I don't think I'd want to output ISO 8601 until we know how to parse
it, and that's waiting on someone who wants to write a parser for it. (See
issue31). In the meantime, here's a useful converter for our proprietary
date format:
perl -pe s/(....)(..)(..)(..)(..)(..)/$1-$2-$3T$4:$5:$6+0000/
--
David Roundy
http://www.darcs.net
|
msg140 (view) |
Author: bortzmeyer |
Date: 2005-12-01.20:48:00 |
|
Also, another limit is that "darcs changes --xml" does not add an "encoding" to
the XML declaration, so the XML flow is sometimes not well-formed.
For instance, I use ISO-8859-1 in my record messages and the XML document is wrong:
% xmllint --noout /tmp/blog.xml
/tmp/blog.xml:15: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x3C 0x2F 0x6E
<name>RFC 3835 terminé</name>
But if I just add <?xml version="1.0" encoding="iso-8859-1"?> at the beginning,
xmllint is now happy. darcs changes should do it (an option to set the encoding,
may be?)
|
msg142 (view) |
Author: droundy |
Date: 2005-12-02.13:05:22 |
|
On Thu, Dec 01, 2005 at 08:48:00PM +0000, Bortzmeyer wrote:
> Also, another limit is that "darcs changes --xml" does not add an
> "encoding" to the XML declaration, so the XML flow is sometimes not
> well-formed.
The trouble is that darcs has no way of knowing the encoding of your
content...
> But if I just add <?xml version="1.0" encoding="iso-8859-1"?> at the
> beginning, xmllint is now happy. darcs changes should do it (an option to
> set the encoding, may be?)
Indeed, that would be the solution, we'd need additional (optional?) input
indicating the encoding.
--
David Roundy
http://www.darcs.net
|
msg145 (view) |
Author: bortzmeyer |
Date: 2005-12-03.14:14:13 |
|
On Fri, Dec 02, 2005 at 01:05:22PM +0000,
David Roundy <bugs@darcs.net> wrote
a message of 25 lines which said:
...
> Indeed, that would be the solution, we'd need additional (optional?) input
> indicating the encoding.
The proper solution is probably to convert commit messages (short and
long) to UTF-8 before storing in the repository. At the commit time,
darcs know the encoding (through the locale) and can therefore convert
from/to.
This would allow the exchange of patches between people with different
encodings and would solve the XML problem for free.
The only problem is that someone has to code it :-) And darcs has to
accept repositories with non-UTF8 characters since they already exist
and we cannot obliviate them.
|
msg149 (view) |
Author: droundy |
Date: 2005-12-03.14:34:03 |
|
On Sat, Dec 03, 2005 at 02:14:13PM +0000, Bortzmeyer wrote:
> > Indeed, that would be the solution, we'd need additional (optional?)
> > input indicating the encoding.
>
> The proper solution is probably to convert commit messages (short and
> long) to UTF-8 before storing in the repository. At the commit time,
> darcs know the encoding (through the locale) and can therefore convert
> from/to.
>
> This would allow the exchange of patches between people with different
> encodings and would solve the XML problem for free.
Indeed, that would probably be the best solution for commit messages. But
for actual file contents, we'd still need some user input, since different
files may be in different encodings, and needn't actually match the current
locale (e.g. translations).
--
David Roundy
http://www.darcs.net
|
msg154 (view) |
Author: bortzmeyer |
Date: 2005-12-04.22:37:43 |
|
On Sat, Dec 03, 2005 at 02:34:04PM +0000,
David Roundy <bugs@darcs.net> wrote
a message of 27 lines which said:
> Indeed, that would probably be the best solution for commit
> messages.
Yes, and since it is the only thing (with the file names, another
tricky problem) displayed by "darcs changes", it would solve this part
of issue 33.
> But for actual file contents, we'd still need some user input, since
> different files may be in different encodings,
Yes, but it is another matter (and a much more difficult issue than
issue33).
|
msg6348 (view) |
Author: twb |
Date: 2008-10-21.01:48:44 |
|
I happened to come across the same encoding issue in Mercurial
recently, and I found that it behaves the way I expect. I have
included a transcript because it demonstrates how to test this.
Specifically, I am recording metadata from a UTF-8 system and then
looking to see what happens on an ASCII system and a Latin-1 system.
For the purposes of this transcript, "hg ci" corresponds to "darcs
record" and "hg log" corresponds to "darcs changes".
$ locale | grep LANG # what encoding is in use?
LANG=en_AU.utf8
$ locale -a # what encodings are available?
C
POSIX
en_AU
en_AU.iso88591
en_AU.utf8
$ hg ci -m 'Naïve test.' # make metadata with non-ASCII.
$ hg log # was it stripped out? (No.)
2008-10-21 Trent W. Buck <trentbuck@gmail.com>
* x:
Naïve test.
[2a43ed65ee0e] [tip]
$ LANG=C hg log | grep test # what happens to unencodable chars?
Na?ve test.
$ LANG=en_AU.iso88591 hg log | grep test | # does reencoding work?
> iconv --from iso-8859-1 # convert it back so we can check
Naïve test.
|
msg6350 (view) |
Author: twb |
Date: 2008-10-21.01:52:48 |
|
Regarding the problem of darcs changes --xml --verbose, from #xml on
irc.freenode.net:
12:47 <twb> Suppose I have a UTF-8 XML file. Is there a way to legally have arbitrary byte vectors -- not necessarily valid UTF-8 -- within this?
12:48 <twb> Specifically I am wrapping arbitrary files (with heterogeneous, unknown encodings) in some XML metadata.
12:48 <[wito]> twb: base64 encode them
12:48 <twb> [wito]: is that my only option?
12:48 <[wito]> twb yap
12:48 <twb> [wito]: OK, thank you.
|
msg11350 (view) |
Author: kowey |
Date: 2010-06-10.09:07:08 |
|
I just noticed that this is a sort of umbrella bug tracking potentially
lots of different issues (and not just the character encoding one)
So far we've got
- issue1872 : ISO8601 dates in XML
- issue1143 : character encodings
- list of files (fixed by darcs changes --xml --summary?)
|
|
Date |
User |
Action |
Args |
2005-11-30 09:06:11 | bortzmeyer | create | |
2005-11-30 13:42:21 | droundy | set | status: unread -> unknown nosy:
droundy, tommy, bortzmeyer messages:
+ msg130 |
2005-11-30 13:59:49 | droundy | link | issue33 superseder |
2005-11-30 13:59:49 | droundy | set | nosy:
droundy, tommy, bortzmeyer superseder:
+ Match ISO-8601 dates, wish: improve "darcs --xml" |
2005-11-30 14:01:01 | droundy | set | nosy:
droundy, tommy, bortzmeyer superseder:
- wish: improve "darcs --xml" |
2005-11-30 14:01:01 | droundy | unlink | issue33 superseder |
2005-11-30 14:01:11 | droundy | link | issue33 superseder |
2005-11-30 14:01:11 | droundy | set | nosy:
droundy, tommy, bortzmeyer superseder:
+ wish: improve "darcs --xml" |
2005-12-01 20:48:00 | bortzmeyer | set | nosy:
droundy, tommy, bortzmeyer messages:
+ msg140 |
2005-12-02 13:05:22 | droundy | set | nosy:
droundy, tommy, bortzmeyer messages:
+ msg142 |
2005-12-03 14:14:13 | bortzmeyer | set | nosy:
droundy, tommy, bortzmeyer messages:
+ msg145 |
2005-12-03 14:34:04 | droundy | set | nosy:
droundy, tommy, bortzmeyer messages:
+ msg149 |
2005-12-04 22:37:44 | bortzmeyer | set | nosy:
droundy, tommy, bortzmeyer messages:
+ msg154 |
2008-02-05 15:49:15 | markstos | set | status: unknown -> deferred nosy:
+ kowey, beschmi title: Severe limitations of "darcs --xml" -> wish: improve "darcs --xml" |
2008-03-28 16:47:28 | droundy | unlink | issue33 superseder |
2008-03-28 16:47:28 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, bortzmeyer superseder:
- wish: improve "darcs --xml" |
2008-10-12 11:28:18 | tux_rocker | link | issue1143 superseder |
2008-10-12 22:28:08 | twb | set | nosy:
+ dmitry.kurochkin, dagit, twb, simon, thorkilnaur |
2008-10-21 01:48:47 | twb | set | nosy:
droundy, tommy, beschmi, kowey, bortzmeyer, dagit, simon, twb, thorkilnaur, dmitry.kurochkin messages:
+ msg6348 |
2008-10-21 01:52:50 | twb | set | nosy:
droundy, tommy, beschmi, kowey, bortzmeyer, dagit, simon, twb, thorkilnaur, dmitry.kurochkin messages:
+ msg6350 |
2009-08-06 17:40:28 | admin | set | nosy:
+ markstos, jast, Serware, darcs-devel, zooko, mornfall, - droundy, bortzmeyer, twb |
2009-08-06 20:46:37 | admin | set | nosy:
- beschmi |
2009-08-10 21:58:34 | admin | set | nosy:
+ bortzmeyer, twb, - markstos, darcs-devel, zooko, jast, Serware, mornfall |
2009-08-10 23:57:51 | admin | set | nosy:
- dagit |
2009-08-25 17:31:07 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-26 17:56:38 | kowey | set | status: deferred -> waiting-for nosy:
tommy, kowey, darcs-devel, bortzmeyer, twb, thorkilnaur, dmitry.kurochkin superseder:
+ Should store patch metadata in utf-8, - Match ISO-8601 dates |
2009-08-27 14:33:40 | admin | set | nosy:
tommy, kowey, darcs-devel, bortzmeyer, twb, thorkilnaur, dmitry.kurochkin |
2010-06-10 08:51:19 | kowey | set | nosy:
- darcs-devel |
2010-06-10 09:00:32 | kowey | unlink | issue1143 superseder |
2010-06-10 09:07:10 | kowey | set | messages:
+ msg11350 superseder:
+ darcs changes --xml is not consistently encoded, use ISO8601 dates in XML output, - Should store patch metadata in utf-8 |
2017-07-30 23:58:15 | gh | set | status: waiting-for -> given-up |
|