darcs

Patch 1628 windows fixes for 2.14.0

Title windows fixes for 2.14.0
Superseder Nosy List bf, ganesh, gh
Related Issues
Status in-discussion Assigned To ganesh
Milestone 2.14.0 HEAD

Created on 2017-12-16.12:25:04 by ganesh, last changed 2018-03-04.18:41:25 by bf.

Files
File name Status Uploaded Type Edit Remove
use-a-strict-readfile-to-avoid-races.dpatch ganesh, 2017-12-16.12:24:48 application/octet-stream
See mailing list archives for discussion on individual patches.
Messages
msg19807 (view) Author: ganesh Date: 2017-12-16.12:24:48
Attached bundle for discussion only for now.

There are various problems in HEAD on Windows, I think primarily caused by the 
encoding changes.

There are a few build breaks, which I've provisionally fixed in the bundle, but need 
to confirm whether they are actually correct once I've been through the test breaks.

There was also several test breaks, though some are not regressions from 2.12

I've also already fixed one of the test breaks (filepath), which turned out to be due 
to the inadvertent introduction of lazy IO in the encoding changes. Reading prefs 
files had become lazy and was racing somehow with updating it (not quite sure of the 
exact mechanism). I've changed one instance of lazy IO to strict to fix this, but I 
think we need to review the code for any more.

There are three more regressions to look at:

issue2262 (actually a new test, but we should try to fix it anyway)
latin9
show_tags

Just for the record I also got the following non-regressions:

git_quoted_filenames
issue1857
issue2035
lazy-clone
send-external
Attachments
msg19809 (view) Author: ganesh Date: 2017-12-18.19:04:33
show_tags is because the test harness on Windows (msys64) expects UNIX 
line endings, whereas darcs show tags now outputs Windows line endings.  
Feels like darcs is in the right here, I might just change to a text 
diff.
msg19881 (view) Author: ganesh Date: 2018-02-15.07:46:36
The issue2262 behaviour is weird, but I think new darcs is strictly better than old darcs here:

               | msys64 window  | command prompt |
-------------------------------------------------|
old behaviour  | lots of ?s     | lots of ?s     |
new behaviour  | garbage        | correct output |
--------------------------------------------------

Unfortunately the tests don't run under a command prompt, and darcs wh -l still prints out the wrong 
thing if directed to a pipe under msys64, so the tests fail. Also it's not just that the msys64 
shell/window is completely broken for display of Unicode, because 'ls' works fine.

I'll keep digging, but this doesn't seem too critical.

Still to look at: the latin9 regression.
msg19886 (view) Author: bf Date: 2018-02-16.01:02:11
Re garbage on msys64: perhaps this is due to Windows here using a wide
character encoding (utf-16le). I guess the pep383 trick does not work
too well with such an encoding. See
https://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using
which recommends to first check that the console font contains the
characters to be displayed.
msg19887 (view) Author: bf Date: 2018-02-16.01:09:39
You can also try "chcp 65001" which should set the code page to utf-8.
msg19888 (view) Author: bf Date: 2018-02-16.01:16:13
Sorry, I see now that the test lib has been extended to use the chcp
command with the correct code page on Windows.
msg19889 (view) Author: bf Date: 2018-02-16.01:24:01
Here is what wikipedia says
(https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8):

"""
Although the locale can be set so the 'M' encodings handle some
multi-byte encodings, it is not possible to set them to support UTF-8
(attempts to use the locale id, code page 65001, are ignored—this code
page is only used for explicit conversion functions such as
MultiByteToWideChar). As many libraries, including the standard C and
C++ library, only allow access to files using the 'M' API, it is not
possible to open all Unicode-named files with them. Thus Unicode is not
supported by Windows in software using a portable API.
"""

If this is correct it means that the switch_to_utf8 command in the test
lib is useless on Windows.

Should we ask haskell-cafe for help?
msg19905 (view) Author: ganesh Date: 2018-02-19.06:42:09
I'm played with it a bit more and I'm thoroughly confused by the console behaviour on 
Windows, but I don't think it really matters: there's no regression, and indeed your patches 
make things better, we just can't prove it in a test. So let's just add skip_windows to the 
issue2262 test.
msg19906 (view) Author: ganesh Date: 2018-02-19.07:06:32
The latin9 regression seems more serious, though. I didn't think very hard about the semantics of what I was 
doing when I wrote the build fix patches, and I suspect the Darcs.Util.Encoding one is completely wrong - for 
Windows it just goes back to the old behaviour, and it seems like that mishandles things being passed on the 
command-line.

I played with trying to just get Windows to do the same thing as other OSes in Darcs.Util.Encoding and 
Darcs.Utils.External (i.e. set the foreign/locale encodings to the filesystem encoding, and use the knob-
based code) but I get this error:

> darcs.exe: darcs-encode: commitBuffer: invalid argument (invalid character)

I suspect that even the filesystem encoding doesn't roundtrip properly on Windows :-( particularly given this 
comment in the docs about using the W APIs instead: https://hackage.haskell.org/package/base-
4.10.1.0/docs/GHC-IO-Encoding.html#v:getFileSystemEncoding

I'll keep playing, but I'm a bit lost about how this should work.
msg19934 (view) Author: bf Date: 2018-03-04.17:47:46
You could try to use the textEncode and textDecode functions from my
recent patch1658 and try if passing another TextEncoding will work
better. If it does, we should use the same encoding for all text IO, in
particular src/Darcs/UI/External.hs and src/Darcs/Util/Lock.hs. In fact,
we should turn the result of getFileSystemEncoding (or, for Windows,
whatever works better) into a global variable, exported by
Darcs.Util.Encoding.
msg19937 (view) Author: bf Date: 2018-03-04.18:41:24
Another idea that could help: setDarcsEncodings (D.UI.External) sets the
foreignEncoding to the same value as the fileSystemEncoding. This
influences everything that uses e.g. withCString. There are a number of
files where this feature is used:

src/win32/System/Posix/Files.hsc
src/win32/System/Posix/IO.hsc
src/Darcs/UI/External.hs
src/Darcs/Util/Compat.hs
src/Darcs/Util/Download/Curl.hs
src/Darcs/Repository/Job.hs

All these places should be reviewed and a decision should be made
whether we want to change the foreignEncoding globally or rather use
GHC.Foreign everywhere and decide which encoding to use case by case.

Of particular interest in this context may be stuff under src/win32.
History
Date User Action Args
2017-12-16 12:25:05ganeshcreate
2017-12-16 12:25:19ganeshsettitle: windows fixes -> windows fixes for 2.14.0
2017-12-18 19:04:53ganeshsetmessages: + msg19809
2018-02-15 07:46:37ganeshsetmessages: + msg19881
2018-02-16 01:02:11bfsetmessages: + msg19886
2018-02-16 01:09:39bfsetmessages: + msg19887
2018-02-16 01:16:13bfsetmessages: + msg19888
2018-02-16 01:24:01bfsetmessages: + msg19889
2018-02-19 06:42:10ganeshsetmessages: + msg19905
2018-02-19 07:06:32ganeshsetmessages: + msg19906
2018-03-04 17:47:46bfsetmessages: + msg19934
2018-03-04 18:41:25bfsetmessages: + msg19937