Patch 1226: resolve issue2416: removed obsolete Darcs.Util.ByteStr...

Title	resolve issue2416: removed obsolete Darcs.Util.ByteStr...
Superseder		Nosy List	bfrk, ganesh
Related Issues
Status	obsoleted	Assigned To
Milestone

Created on 2014-11-15.22:32:29 by bfrk, last changed 2014-11-17.22:26:47 by bfrk. Tracked on DarcsWatch.

Files
File name	Status	Uploaded	Type	Edit	Remove
make-the-options-type-used-by-a-command-into-a-type-parameter_and_4_others.dpatch		bfrk, 2014-11-16.20:11:28	application/x-darcs-patch
patch-preview.txt		bfrk, 2014-11-15.22:32:29	text/x-darcs-patch
resolve-issue2416_-removed-obsolete-darcs_util_bytestring_decodestring.dpatch		bfrk, 2014-11-15.22:32:29	application/x-darcs-patch
unnamed		bfrk, 2014-11-15.22:32:29

See mailing list archives for discussion on individual patches.

Messages
msg17812 (view)	Author: bfrk	Date: 2014-11-15.22:32:29
Note this only resolves the bug, it doesn't add any unit tests. I am now convinced that decodeString is actually harmful with modern (i.e. > 7.0) ghc. 1 patch for repository http://darcs.net/screened: patch 5f4b9bb00fe09b25ee6b15e5171c765230f8db76 Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de> Date: Sat Nov 15 23:27:38 CET 2014 * resolve issue2416: removed obsolete Darcs.Util.ByteString.decodeString Attachments patch-preview.txt resolve-issue2416_-removed-obsolete-darcs_util_bytestring_decodestring.dpatch unnamed
msg17814 (view)	Author: ganesh	Date: 2014-11-15.23:00:48
Unfortunately this seems to break the utf8.sh shell test :-( I agree we really should be getting rid of things like this though.
msg17815 (view)	Author: bfrk	Date: 2014-11-15.23:24:44
Hm, it works fine when I run the test on my machine. I am running ubuntu 14.04 with ghc-7.6.3 (ubuntu package). My locale is de_DE.UTF-8. Can you say where exactly the test fails?
msg17816 (view)	Author: ganesh	Date: 2014-11-15.23:37:19
Looks like it's at "patch name from command line not UTF-8-encoded!" - I'm running NixOS 14.04, locale en_GB.UTF-8.
msg17817 (view)	Author: bfrk	Date: 2014-11-16.00:39:26
I ran the test as > ./dist/build/darcs-test/darcs-test --tests=utf which gave me no indication that I have to generate an iso885915 encoding for the test to do something ;) When I add this locale is fails for me, too. I begin to understand why this is the case. What I had trouble to understand: how does decodeString not destroy entering utf-8 encoded strings? I think it does do that -- but only for characters with a code point above 255 and I happen to be part of the lucky minority of people whose native language encoding involves only code points below. So the next question is: do we still want to support the 8-bit-only non-unicode encodings at the cost of alienating those whose native encoding needs code points above 255? That would entail scrapping the so called utf8.sh test (which should actually be called iso88915.sh). I am pretty sure, if we asked around how other projects handle this, it would turn out we are the only ones sticking to latin encodings. Note this whole issue is purely about strings we get from the command line.
msg17818 (view)	Author: bfrk	Date: 2014-11-16.00:42:37
I wrote: > That would entail scrapping the so > called utf8.sh test (which should actually be called iso88915.sh). The "that" refers to the implicit proposal (framed as a question) to no longer support latin encodings for strings we get from the command line.
msg17819 (view)	Author: ganesh	Date: 2014-11-16.02:02:43
I'm quite happy to make a principled change to darcs' target behaviour, but I can't pretend to understand the issues fully. Unfortunately encoding issues tend to be quite complex :-( Some general thoughts, in a somewhat random order: I did do some work on the encoding code a while ago to manage the transition to GHC 7.2's behaviour, but I've mostly forgotten what I learnt then. From what I remember, I mostly tinkered around the edges to change the way we dealt with incoming data, rather than really understanding exactly what was going on. I think there are also various open bugs relating to our internationalization, so I'm sure the current behaviour is not perfect. Is it worth writing a test to demonstrate your hypothesis about destroying incoming utf-8? I also agree that the code implies this would happen but it would be good to be sure. I would also like to understand why we can't support both scenarios, depending on the user's current locale. Is it logically impossible or just difficult because of the way darcs's code is written, for example.
msg17820 (view)	Author: bfrk	Date: 2014-11-16.13:44:34
You raise a number of good questions. * Writing a test to demonstrate my claim: yes, good idea. I'll try to do that. * Support both scenarios: That would be ideal, of course. Now that you ask, I think this should be possible. However, digging into the base library sources and searching around in the available libraries I found no simple way to do that. I have sent a message to haskell-cafe, perhaps someone has an idea.
msg17822 (view)	Author: ganesh	Date: 2014-11-16.15:32:05
I (belatedly) now remembering a bit more of the history behind the current situation. Despite being responsible for the recent changes it's rather a blank in my memory :-) - we actually force the encoding we read from the command-line to 'char8': see patch777. - some of the current encoding code was imported from haskeline when that stopped exporting the helper functionality we were previously using: see patch1013. Not sure if that's directly relevant to the current situation. Generally I'm not suggesting we should investigate this issue to death and come up with a perfect solution. But we should be a bit careful not just to flip-flop based on one specific problem at hand, and we should document what we expect to support with tests (and of course remove the tests we no longer wish to support).
msg17823 (view)	Author: bfrk	Date: 2014-11-16.17:08:04
OMG. This is all so messed up I am getting headaches just from thinking about it. So (since patch777) we do 'setFileSystemEncoding char8'. And since we do that before 'System.IO.initForeignEncoding' has been called for the first time, it permanently overrides the users locale, as seen by all Haskell libraries. IIUC, the main reason it's been done that way is so that things like file names can be round-tripped no matter what the encoding is. On the other hand, we want to store patch meta data in utf-8. But stuff from the command line is (unfortunately?) also subject to encoding, using initForeignEncoding, which we globally set to char8. Therefore the decodeString function. We can solve the original problem (issue2416) by adding a reverse function to Darcs.Util.ByteString and using that in the Option definitions when defining ounparse. The larger problem (the reason we enforce char8 as the default) should perhaps be discussed elsewhere.
msg17824 (view)	Author: ganesh	Date: 2014-11-16.17:46:25
> IIUC, the main reason it's been done that way > is so that things like file names can be round-tripped > no matter what the encoding is. Yes, I think that's the motivation. I spent a while investigating alternatives (see issue2095) but this ended up being the simplest "solution". I had some vague thoughts in the long-term of using newtypes around String and ByteString to identify where in our code what encoding is in use (the obvious ones being Char8, UTF-8 or user-locale), but that's not going to be for 2.10. As you say that's probably best discussed elsewhere. > We can solve the original problem (issue2416) by > adding a reverse function to Darcs.Util.ByteString > and using that in the Option definitions when > defining ounparse. Isn't the problem here that 'decodeString' isn't invertible? The example I gave in issue2416 was derived from running the utf8.sh with patch1222, and it reflects a tag name being passed on the command-line.
msg17825 (view)	Author: bfrk	Date: 2014-11-16.18:54:44
> > We can solve the original problem (issue2416) by > > adding a reverse function to Darcs.Util.ByteString > > and using that in the Option definitions when > > defining ounparse. > Isn't the problem here that 'decodeString' isn't invertible? The > example I gave in issue2416 was derived from running the utf8.sh > with patch1222, and it reflects a tag name being passed on the > command-line. You are probably right. What I am certain of is that applying decodeString when creating the DarcsFlag but not applying an inverse when going back from the DarcsFlag to a String is wrong. You say there is no inverse, so I think the only viable solution is to either apply decodeString before options are parsed or afterwards when they are used. I you have an idea how to get out of this mess, please go ahead.
msg17826 (view)	Author: bfrk	Date: 2014-11-16.20:11:28
Okay, here is another patch bundle that implements a solution according to what Ganesh proposed. With this patch I no longer get failures for the utf-8.sh test. It also includes some small bugfix patches. However, workingdir.sh is broken now. Attachments make-the-options-type-used-by-a-command-into-a-type-parameter_and_4_others.dpatch
msg17830 (view)	Author: ganesh	Date: 2014-11-17.06:56:19
A quick update on test failures following our IRC conversation. The send failures are entirely caused by my "make the options type into a type parameter" patches. I don't know why I didn't see them before, but now they happen without any of your patches from this bundle. Your patches from this bundle commute with the options type changes, so we can sort them out separately unless you expect some semantic dependency. The workingdir.sh failure is actually caused by "fix: the toPatch option was broken, probably due to some cut/paste". I haven't figured out exactly how we go to this state yet; the fix is clearly correct, but it means the flag isn't recognised by the call to havePatchsetMatch in Darcs.Repository.copyRepoAndGoToChosenVersion. I'll update the bug if I figure out more.
msg17831 (view)	Author: ganesh	Date: 2014-11-17.07:19:14
OK, I figured out the workingdir problem. Before the options rewrite, --to-patch was actually parsed in two different ways depending on command: matchOneContext = DarcsMultipleChoiceOption [DarcsArgOption [] ["to-match"] OnePattern "PATTERN" "select changes up to a patch matching PATTERN", DarcsArgOption [] ["to-patch"] OnePatch "REGEXP" "select changes up to a patch matching REGEXP", __tag, DarcsAbsPathOption [] ["context"] Context "FILENAME" "version specified by the context in FILENAME" ] and matchOneContext = DarcsMultipleChoiceOption [DarcsArgOption [] ["to-match"] OnePattern "PATTERN" "select changes up to a patch matching PATTERN", DarcsArgOption [] ["to-patch"] OnePatch "REGEXP" "select changes up to a patch matching REGEXP", __tag, DarcsAbsPathOption [] ["context"] Context "FILENAME" "version specified by the context in FILENAME" ] It looks like matchOneContext was only used by clone. I can sort of see why this approach made sense for clone because you can't get a proper range of patches, you really have to start at the beginning of the repo, but the user would expect --to-patch to behave as it says. On the other hand having two different parsings inside Darcs is pretty confusing. I guess we should just change the clone code to understand 'UpToPatch' etc properly.
msg17838 (view)	Author: bfrk	Date: 2014-11-17.22:26:47
Thanks for the analysis Ganesh. I thinks this explains a lot of the strange things I have seen. I agree that the better approach is to deal with this in the clone command.

History
Date	User	Action	Args
2014-11-15 22:32:29	bfrk	create
2014-11-15 23:00:49	ganesh	set	status: needs-screening -> in-discussion nosy: + ganesh messages: + msg17814
2014-11-15 23:24:44	bfrk	set	messages: + msg17815
2014-11-15 23:37:19	ganesh	set	messages: + msg17816
2014-11-16 00:39:27	bfrk	set	messages: + msg17817
2014-11-16 00:42:37	bfrk	set	messages: + msg17818
2014-11-16 02:02:44	ganesh	set	messages: + msg17819
2014-11-16 13:44:34	bfrk	set	messages: + msg17820
2014-11-16 15:32:05	ganesh	set	messages: + msg17822
2014-11-16 17:08:04	bfrk	set	messages: + msg17823
2014-11-16 17:46:25	ganesh	set	messages: + msg17824
2014-11-16 18:54:44	bfrk	set	messages: + msg17825
2014-11-16 20:11:28	bfrk	set	files: + make-the-options-type-used-by-a-command-into-a-type-parameter_and_4_others.dpatch messages: + msg17826
2014-11-16 20:12:05	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-58bd717d4811a1331ddb8569c1806f15dbc3e175
2014-11-17 06:56:20	ganesh	set	messages: + msg17830
2014-11-17 07:19:14	ganesh	set	messages: + msg17831
2014-11-17 22:26:47	bfrk	set	status: in-discussion -> obsoleted messages: + msg17838

Patch 1226 resolve issue2416: removed obsolete Darcs.Util.ByteStr...

Attachments

Attachments