darcs

Issue 33 wish: improve "darcs --xml"

Title wish: improve "darcs --xml"
Priority wishlist Status given-up
Milestone Resolved in
Superseder darcs changes --xml is not consistently encoded, use ISO8601 dates in XML output
View: 1143, 1872
Nosy List bortzmeyer, dmitry.kurochkin, kowey, thorkilnaur, tommy, twb
Assigned To
Topics

Created on 2005-11-30.09:06:11 by bortzmeyer, last changed 2017-07-30.23:58:15 by gh.

Messages
msg121 (view) Author: bortzmeyer Date: 2005-11-30.09:06:11
darcs --xml has the following limitations:

1) There is no way to get the list of files (--verbose seems ignored)

2) The dates (in <date>) are in a proprietary format. IMHO, they should be in
W3C Schema xsd:date or in ISO 8601 ("2005-11-30T10:03:06+0100") or RFC 3339 (a
subset of ISO 8601)
msg130 (view) Author: droundy Date: 2005-11-30.13:42:21
On Wed, Nov 30, 2005 at 09:06:11AM +0000, Bortzmeyer wrote:
> darcs --xml has the following limitations:

>From context, it's clear that you mean darcs changes --xml

> 1) There is no way to get the list of files (--verbose seems ignored)

You can get this with darcs annotate --xml.  Adding --summary or --verbose
would be doable, but I'm downgrading this to wishlist, since the
functionality is already there.

> 2) The dates (in <date>) are in a proprietary format. IMHO, they should
> be in W3C Schema xsd:date or in ISO 8601 ("2005-11-30T10:03:06+0100") or
> RFC 3339 (a subset of ISO 8601)

You just need to add some dashes, a T and some colons and a +000 to get ISO
8601.  I don't think I'd want to output ISO 8601 until we know how to parse
it, and that's waiting on someone who wants to write a parser for it.  (See
issue31).  In the meantime, here's a useful converter for our proprietary
date format:

perl -pe s/(....)(..)(..)(..)(..)(..)/$1-$2-$3T$4:$5:$6+0000/
-- 
David Roundy
http://www.darcs.net
msg140 (view) Author: bortzmeyer Date: 2005-12-01.20:48:00
Also, another limit is that "darcs changes --xml" does not add an "encoding" to
the XML declaration, so the XML flow is sometimes not well-formed.

For instance, I use ISO-8859-1 in my record messages and the XML document is wrong:

% xmllint --noout /tmp/blog.xml 
/tmp/blog.xml:15: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x3C 0x2F 0x6E
        <name>RFC 3835 terminé</name>

But if I just add <?xml version="1.0" encoding="iso-8859-1"?> at the beginning,
xmllint is now happy. darcs changes should do it (an option to set the encoding,
may be?)
msg142 (view) Author: droundy Date: 2005-12-02.13:05:22
On Thu, Dec 01, 2005 at 08:48:00PM +0000, Bortzmeyer wrote:
> Also, another limit is that "darcs changes --xml" does not add an
> "encoding" to the XML declaration, so the XML flow is sometimes not
> well-formed.

The trouble is that darcs has no way of knowing the encoding of your
content...

> But if I just add <?xml version="1.0" encoding="iso-8859-1"?> at the
> beginning, xmllint is now happy. darcs changes should do it (an option to
> set the encoding, may be?)

Indeed, that would be the solution, we'd need additional (optional?) input
indicating the encoding.
-- 
David Roundy
http://www.darcs.net
msg145 (view) Author: bortzmeyer Date: 2005-12-03.14:14:13
On Fri, Dec 02, 2005 at 01:05:22PM +0000,
 David Roundy <bugs@darcs.net> wrote 
 a message of 25 lines which said:

...
> Indeed, that would be the solution, we'd need additional (optional?) input
> indicating the encoding.

The proper solution is probably to convert commit messages (short and
long) to UTF-8 before storing in the repository. At the commit time,
darcs know the encoding (through the locale) and can therefore convert
from/to.

This would allow the exchange of patches between people with different
encodings and would solve the XML problem for free. 

The only problem is that someone has to code it :-) And darcs has to
accept repositories with non-UTF8 characters since they already exist
and we cannot obliviate them.
msg149 (view) Author: droundy Date: 2005-12-03.14:34:03
On Sat, Dec 03, 2005 at 02:14:13PM +0000, Bortzmeyer wrote:
> > Indeed, that would be the solution, we'd need additional (optional?)
> > input indicating the encoding.
> 
> The proper solution is probably to convert commit messages (short and
> long) to UTF-8 before storing in the repository. At the commit time,
> darcs know the encoding (through the locale) and can therefore convert
> from/to.
> 
> This would allow the exchange of patches between people with different
> encodings and would solve the XML problem for free.

Indeed, that would probably be the best solution for commit messages.  But
for actual file contents, we'd still need some user input, since different
files may be in different encodings, and needn't actually match the current
locale (e.g. translations).
-- 
David Roundy
http://www.darcs.net
msg154 (view) Author: bortzmeyer Date: 2005-12-04.22:37:43
On Sat, Dec 03, 2005 at 02:34:04PM +0000,
 David Roundy <bugs@darcs.net> wrote 
 a message of 27 lines which said:

> Indeed, that would probably be the best solution for commit
> messages.

Yes, and since it is the only thing (with the file names, another
tricky problem) displayed by "darcs changes", it would solve this part
of issue 33.

> But for actual file contents, we'd still need some user input, since
> different files may be in different encodings,

Yes, but it is another matter (and a much more difficult issue than
issue33).
msg6348 (view) Author: twb Date: 2008-10-21.01:48:44
I happened to come across the same encoding issue in Mercurial
recently, and I found that it behaves the way I expect.  I have
included a transcript because it demonstrates how to test this.

Specifically, I am recording metadata from a UTF-8 system and then
looking to see what happens on an ASCII system and a Latin-1 system.

For the purposes of this transcript, "hg ci" corresponds to "darcs
record" and "hg log" corresponds to "darcs changes".

$ locale | grep LANG            # what encoding is in use?
LANG=en_AU.utf8
$ locale -a                     # what encodings are available?
C
POSIX
en_AU
en_AU.iso88591
en_AU.utf8
$ hg ci -m 'Naïve test.'        # make metadata with non-ASCII.
$ hg log                        # was it stripped out?  (No.)
2008-10-21  Trent W. Buck  <trentbuck@gmail.com>

	* x:
	Naïve test.
	[2a43ed65ee0e] [tip]

$ LANG=C hg log | grep test     # what happens to unencodable chars?
	Na?ve test.
$ LANG=en_AU.iso88591 hg log | grep test | # does reencoding work?
> iconv --from iso-8859-1       # convert it back so we can check
	Naïve test.
msg6350 (view) Author: twb Date: 2008-10-21.01:52:48
Regarding the problem of darcs changes --xml --verbose, from #xml on
irc.freenode.net:

12:47 <twb> Suppose I have a UTF-8 XML file.  Is there a way to legally have arbitrary byte vectors -- not necessarily valid UTF-8 -- within this?
12:48 <twb> Specifically I am wrapping arbitrary files (with heterogeneous, unknown encodings) in some XML metadata.
12:48 <[wito]> twb: base64 encode them
12:48 <twb> [wito]: is that my only option?
12:48 <[wito]> twb yap
12:48 <twb> [wito]: OK, thank you.
msg11350 (view) Author: kowey Date: 2010-06-10.09:07:08
I just noticed that this is a sort of umbrella bug tracking potentially
lots of different issues (and not just the character encoding one)

So far we've got
- issue1872 : ISO8601 dates in XML
- issue1143 : character encodings
- list of files (fixed by darcs changes --xml --summary?)
History
Date User Action Args
2005-11-30 09:06:11bortzmeyercreate
2005-11-30 13:42:21droundysetstatus: unread -> unknown
nosy: droundy, tommy, bortzmeyer
messages: + msg130
2005-11-30 13:59:49droundylinkissue33 superseder
2005-11-30 13:59:49droundysetnosy: droundy, tommy, bortzmeyer
superseder: + Match ISO-8601 dates, wish: improve "darcs --xml"
2005-11-30 14:01:01droundysetnosy: droundy, tommy, bortzmeyer
superseder: - wish: improve "darcs --xml"
2005-11-30 14:01:01droundyunlinkissue33 superseder
2005-11-30 14:01:11droundylinkissue33 superseder
2005-11-30 14:01:11droundysetnosy: droundy, tommy, bortzmeyer
superseder: + wish: improve "darcs --xml"
2005-12-01 20:48:00bortzmeyersetnosy: droundy, tommy, bortzmeyer
messages: + msg140
2005-12-02 13:05:22droundysetnosy: droundy, tommy, bortzmeyer
messages: + msg142
2005-12-03 14:14:13bortzmeyersetnosy: droundy, tommy, bortzmeyer
messages: + msg145
2005-12-03 14:34:04droundysetnosy: droundy, tommy, bortzmeyer
messages: + msg149
2005-12-04 22:37:44bortzmeyersetnosy: droundy, tommy, bortzmeyer
messages: + msg154
2008-02-05 15:49:15markstossetstatus: unknown -> deferred
nosy: + kowey, beschmi
title: Severe limitations of "darcs --xml" -> wish: improve "darcs --xml"
2008-03-28 16:47:28droundyunlinkissue33 superseder
2008-03-28 16:47:28droundysetnosy: droundy, tommy, beschmi, kowey, bortzmeyer
superseder: - wish: improve "darcs --xml"
2008-10-12 11:28:18tux_rockerlinkissue1143 superseder
2008-10-12 22:28:08twbsetnosy: + dmitry.kurochkin, dagit, twb, simon, thorkilnaur
2008-10-21 01:48:47twbsetnosy: droundy, tommy, beschmi, kowey, bortzmeyer, dagit, simon, twb, thorkilnaur, dmitry.kurochkin
messages: + msg6348
2008-10-21 01:52:50twbsetnosy: droundy, tommy, beschmi, kowey, bortzmeyer, dagit, simon, twb, thorkilnaur, dmitry.kurochkin
messages: + msg6350
2009-08-06 17:40:28adminsetnosy: + markstos, jast, Serware, darcs-devel, zooko, mornfall, - droundy, bortzmeyer, twb
2009-08-06 20:46:37adminsetnosy: - beschmi
2009-08-10 21:58:34adminsetnosy: + bortzmeyer, twb, - markstos, darcs-devel, zooko, jast, Serware, mornfall
2009-08-10 23:57:51adminsetnosy: - dagit
2009-08-25 17:31:07adminsetnosy: + darcs-devel, - simon
2009-08-26 17:56:38koweysetstatus: deferred -> waiting-for
nosy: tommy, kowey, darcs-devel, bortzmeyer, twb, thorkilnaur, dmitry.kurochkin
superseder: + Should store patch metadata in utf-8, - Match ISO-8601 dates
2009-08-27 14:33:40adminsetnosy: tommy, kowey, darcs-devel, bortzmeyer, twb, thorkilnaur, dmitry.kurochkin
2010-06-10 08:51:19koweysetnosy: - darcs-devel
2010-06-10 09:00:32koweyunlinkissue1143 superseder
2010-06-10 09:07:10koweysetmessages: + msg11350
superseder: + darcs changes --xml is not consistently encoded, use ISO8601 dates in XML output, - Should store patch metadata in utf-8
2017-07-30 23:58:15ghsetstatus: waiting-for -> given-up