On IRC, Wolfgang Jeltsch observed:
The problem with the Printable type is that it mixes values of type
String (sequences of characters) and values of type ByteString
(sequences of bytes), which are completely different things. :-(
Mixing bytes and characters freely works only as long as you fix an
encoding. Well, maybe you always use UTF-8 internally. But even then,
this is not visible in the code, since a ByteString doesn’t carry
encoding info.
http://irclog.perlgeek.de/darcs/2010-12-16#i_3094602
Doing some research, I found that the justification for this support in
Sun Jun 13 01:02:34 BST 2004 jch@pps.jussieu.fr
* Avoid unpacking PackedStrings in the printer.
Darcs reads file data into PackedStrings, but unpacks them when
printing out a patch.
The fix is to make the printer able to grok streams of arbitrary
tokens, not just Haskell strings (streams of Char). See the type
class Printer.Printable and the instance Printer.PChar. See also the
type synonim PrintPatch.PrinterType, which is what gets actually used.
The net effect is that darcs whatsnew is more than twice as fast, and
darcs pull of large patches uses 10 (!) times less memory. On the
other hand, darcs pull of many small patches uses up a few percent
more CPU time, which I don't understand.
Wolfgang believes that since we are now caring about user locales, this
approach may no longer be valid.
It seems like it could be useful to research this question from a code
quality perspective
|