Issue 2569 bug in decoding multibyte characters crossing a 4096-byte boundary

Title bug in decoding multibyte characters crossing a 4096-byte boundary
Priority Status needs-diagnosis/design
Milestone 2.14.0 HEAD Resolved in
Superseder Nosy List ganesh
Assigned To

Created on 2018-02-26.22:40:31 by ganesh, last changed 2018-03-04.17:26:08 by bf.

msg19920 (view) Author: ganesh Date: 2018-02-26.22:40:30
The decode then encode test in patch1654 fails if we remove the 
restriction that the string to decode should be <=4096 bytes. I 
think this is a bug in knob as that does something in 4096 bytes 

I also assume (but haven't double-checked) that this is a regression 
from 2.12.

I experimented briefly with using [bytestring-handle]
(http://hackage.haskell.org/package/bytestring-handle) to replace 
knob but didn't get it working, not sure whether due to lack of 
time/a stupid mistake, or a fundamental problem with the idea.
msg19929 (view) Author: bf Date: 2018-03-04.11:58:26
This looks like a bug in the knob library, or at least an
incompatibility. It hasn't seen any updates since 2012. Looking at the
code doesn't reveal any obvious faults, but it does use a lot of GHC
internals, including GHC.IO.Buffer where I guess the incompatibility
lies hidden.
msg19930 (view) Author: bf Date: 2018-03-04.12:02:05
Particularly, I see this line

	newBuffer _ = IO.newByteBuffer 4096

in knob source code which is a pretty strong hint...
msg19931 (view) Author: bf Date: 2018-03-04.16:33:06
Let's take a step back. The only reason I have introduced knob is to get
at the functionality of GHC's internal fileSystemEncoding. When you look
at GHC.IO.Encoding you see that they actually expose the encoding and
decoding functions, it's just that they work in a stateful manner on
things of type GHC.IO.Buffer.Buffer. I wanted to avoid having to fiddle
with these things but it looks like we have to.

I think I can come up with reasonably compact versions of encode and
decode that work more directly with a given TextEncoding.
msg19932 (view) Author: bf Date: 2018-03-04.16:49:20
I think
has everything we need. Converting between CStringLen and ByteString is
explicitly supported.
msg19933 (view) Author: bf Date: 2018-03-04.17:26:07
See patch1658.
Date User Action Args
2018-02-26 22:40:31ganeshcreate
2018-02-26 22:41:33ganeshlinkpatch1654 issues
2018-03-04 11:58:28bfsetmessages: + msg19929
2018-03-04 12:02:06bfsetmessages: + msg19930
2018-03-04 16:33:08bfsetmessages: + msg19931
2018-03-04 16:49:21bfsetmessages: + msg19932
2018-03-04 17:26:08bfsetmessages: + msg19933