darcs

Issue 2018 should Printer drop ByteString support? (or alternatively drop String)

Title should Printer drop ByteString support? (or alternatively drop String)
Priority wishlist Status needs-reproduction
Milestone Resolved in
Superseder Nosy List kowey
Assigned To
Topics Devel

Created on 2010-12-16.16:37:39 by kowey, last changed 2010-12-16.16:42:00 by kowey.

Messages
msg13354 (view) Author: kowey Date: 2010-12-16.16:37:37
On IRC, Wolfgang Jeltsch observed:

The problem with the Printable type is that it mixes values of type 
String (sequences of characters) and values of type ByteString 
(sequences of bytes), which are completely different things. :-( 
 Mixing bytes and characters freely works only as long as you fix an 
encoding. Well, maybe you always use UTF-8 internally. But even then, 
this is not visible in the code, since a ByteString doesn’t carry 
encoding info.

http://irclog.perlgeek.de/darcs/2010-12-16#i_3094602

Doing some research, I found that the justification for this support in

Sun Jun 13 01:02:34 BST 2004  jch@pps.jussieu.fr
  * Avoid unpacking PackedStrings in the printer.
  Darcs reads file data into PackedStrings, but unpacks them when
  printing out a patch.
        
  The fix is to make the printer able to grok streams of arbitrary
  tokens, not just Haskell strings (streams of Char).  See the type
  class Printer.Printable and the instance Printer.PChar.  See also the
  type synonim PrintPatch.PrinterType, which is what gets actually used.
  
  The net effect is that darcs whatsnew is more than twice as fast, and
  darcs pull of large patches uses 10 (!) times less memory.  On the
  other hand, darcs pull of many small patches uses up a few percent
  more CPU time, which I don't understand.

Wolfgang believes that since we are now caring about user locales, this 
approach may no longer be valid.

It seems like it could be useful to research this question from a code 
quality perspective
msg13355 (view) Author: kowey Date: 2010-12-16.16:41:59
Wolfgang adds that part of the problem is that bytestrings are written to 
stdout directly (whereas Strings are written with hPutStr, which may be 
locale-sensitive from GHC 6.12 on), so this may lead to two different 
behaviours depending on the internal type
History
Date User Action Args
2010-12-16 16:37:39koweycreate
2010-12-16 16:42:00koweysetmessages: + msg13355