darcs

Issue 2391 unwanted colorization control chars on dumb terminal (Unicode text)

Title unwanted colorization control chars on dumb terminal (Unicode text)
Priority bug Status needs-diagnosis/design
Milestone Resolved in
Superseder Nosy List imz, jaredj
Assigned To
Topics ProbablyEasy, UI

Created on 2014-05-20.08:24:06 by imz, last changed 2014-05-20.14:40:43 by kowey.

Messages
msg17471 (view) Author: imz Date: 2014-05-20.08:24:01
1. Summarise the issue (what were doing, what went wrong?)

As we know from another bug report -- http://bugs.darcs.net/issue2389 ,
in the output of "darcs whatsnew", Cyrillic letters are not printed as
is, but
rather as wrong(!) codes:

$ darcs init
$ echo Здравствуйте > hello.txt
$ darcs add -r .
$ darcs wh 
addfile ./hello.txt
hunk ./hello.txt 1
+<U+00D0><U+0097><U+00D0><U+00B4><U+00D1><U+0080><U+00D0><U+00B0><U+00D0><U+00B2><U+00D1><U+0081><U+00D1><U+0082><U+00D0><U+00B2><U+00D1><U+0083><U+00D0><U+00B9><U+00D1><U+0082><U+00D0><U+00B5>
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ 

These codes (which are wrong anyway) are colorized (red) in terminals.

Here comes the new bug:

if we are on a dumb terminal (TERM=dumb) or pipe the command output into
a pipe, some garbage is printed around those codes which used to be
colorized:

$ echo world > world.txt
$ darcs add world.txt 
$ darcs wh | cat
addfile ./hello.txt
hunk ./hello.txt 1
+[_<U+00D0>_][_<U+0097>_][_<U+00D0>_][_<U+00B4>_][_<U+00D1>_][_<U+0080>_][_<U+00D0>_][_<U+00B0>_][_<U+00D0>_][_<U+00B2>_][_<U+00D1>_][_<U+0081>_][_<U+00D1>_][_<U+0082>_][_<U+00D0>_][_<U+00B2>_][_<U+00D1>_][_<U+0083>_][_<U+00D0>_][_<U+00B9>_][_<U+00D1>_][_<U+0082>_][_<U+00D0>_][_<U+00B5>_]
addfile ./world.txt
hunk ./world.txt 1
+world
$ 

Perhaps, this garbage is the color highlighting for these codes... but
this garbage doesn't appear around other highlighted text; for example,
above, the words "addfile" and "hunk" are printed in blue in a terminal.

This demonstrates that something is wrong with the function that prints
these colored codes, because it wants to do bad things on a dumb terminal.

Eric Cow suggested that this looks related to
http://bugs.darcs.net/issue918 .

2. What behaviour were you expecting instead?

It should not attempt to color things on dumb terminals or in pipes.

3. What darcs version are you using? (Try: darcs --exact-version)

ghc7.6.1-darcs-2.8.4-alt2

$ darcs --exact-version
darcs compiled on Feb 26 2013, at 18:09:42

Context:

[TAG 2.8.4
Ganesh Sittampalam <ganesh@earth.li>**20130127231845
 Ignore-this: d032f69540341ecfd5858fce7aee1457
] 

[Resolve issue2155: Expurgate the non-functional annotate --xml-output
option
Dave Love <fx@gnu.org>**20130127231835
 Ignore-this: eb03207031e75687968091d56fb008f8
 backported from HEAD by Ganesh Sittampalam <ganesh@earth.li>
] 

[Resolve issue2155: Expurgate the non-functional annotate --xml-output
option
Dave Love <fx@gnu.org>**20130120121739
 Ignore-this: 8a9ce6409a50b71cd0d2fdabbc181b1a
 backported from HEAD by Ganesh Sittampalam <ganesh@earth.li>
] 

[note dependency bumps in NEWS
Ganesh Sittampalam <ganesh@earth.li>**20130120170310
 Ignore-this: 48cf181c89ec1b69fc6e9e701734ff19
] 

[bump version to 2.8.4
Ganesh Sittampalam <ganesh@earth.li>**20130120154856
 Ignore-this: 2f2542e9825b66cda3a0a17275b5e311
] 

[resolve issue2199: getMatchingTag needs to commute for dirty tags
Ganesh Sittampalam <ganesh@earth.li>**20121218191024
 Ignore-this: 947252cd8e084b793044aff564f0462d
 backported from HEAD
] 

[accept issue2199: darcs get --tag gets too much
Ganesh Sittampalam <ganesh@earth.li>**20120528164525
 Ignore-this: 8c138a80c294e6181a3ef9250593fa31
] 

[constrain haskeline version on old GHC
Ganesh Sittampalam <ganesh@earth.li>**20130119234432
 Ignore-this: 8af00cc3d3c1ad223a8b35712c06bae
] 

[Add option -a to darcs changes in Setup.lhs
Ben Franksen <ben.franksen@online.de>**20120811195807
 Ignore-this: f23d2e558f7248fec8d07b0391d9a7e8
 
 Some (potential) contributors (like me) have 'changes interactive'
 in their ~/.darcs/defaults and then wonder why their build hangs.
] 

[add copyright notices for the imported haskeline code
Ganesh Sittampalam <ganesh@earth.li>**20130119163610
 Ignore-this: afcdc8048f8b3233fa17d3ab0c9c311f
 licence/copyright taken from haskeline 0.6.4.7:
 BSD3, copyright Judah Jacobson
] 

[import encoding code from haskeline: switch over
Ganesh Sittampalam <ganesh@earth.li>**20130118231907
 Ignore-this: b423a92ba93e74520d0578ac21aceab3
] 

[import encoding code from haskeline: source files
Ganesh Sittampalam <ganesh@earth.li>**20130118225947
 Ignore-this: c2d1e228fa4cce3e66e90a14fa2f3200
] 

[import encoding code from haskeline: cabal file changes
Ganesh Sittampalam <ganesh@earth.li>**20130118070642
 Ignore-this: d2ed13887d0c547cb7498bd5a2aef46f
] 

[import encoding code from haskeline: Setup.lhs changes
Ganesh Sittampalam <ganesh@earth.li>**20130115181040
 Ignore-this: 31ccdca76001bff769464fb7a8e574e9
] 

[ROLLBACK: conditionally use bytestring-handle
Ganesh Sittampalam <ganesh@earth.li>**20130111213829
 Ignore-this: d3c18b61f765bdfcb574b4977185197b
 It doesn't skip invalid byte sequences when decoding so breaks on 
 repositories with non-UTF8 encoded metadata.
] 

[bump network dependency
Ganesh Sittampalam <ganesh@earth.li>**20130111184607
 Ignore-this: 54e55fa09793008d55572b6acad1a7b8
] 

[add some comments about "nearby" darcs, and print out the one that was
found
Ganesh Sittampalam <ganesh@earth.li>**20130111211817
 Ignore-this: 57385f3248fc539435ed9de069a40bd5
 backported from HEAD
] 

[make darcs-test look "nearby" for a darcs exe to use
Ganesh Sittampalam <ganesh@earth.li>**20130111211445
 Ignore-this: d952cf330c0d9510c5c973dd41b191e0
 backported from HEAD
] 

[need different path separator on Windows
Ganesh Sittampalam <ganesh@earth.li>**20121219191221
 Ignore-this: 42c7ba1e46f5b6b600d23838e5a162cb
] 

[test for issue2286: make sure we can read repos with non-UTF8 metadata
Ganesh Sittampalam <ganesh@earth.li>**20130102222735
 Ignore-this: adc6165d5d5d991383ebf0e6547f7bf4
] 

[We can use chcp to switch encodings on Windows
Ganesh Sittampalam <ganesh@earth.li>**20130101122254
 Ignore-this: bc115467e31e144694a33e43dca3fb6c
 This means that the tests that require different encodings can run.
] 

[Find latin9 locale on OS X too
Michael Hendricks <michael@ndrix.org>**20120420202408
 Ignore-this: c87db3b97312234ed2380d2ca11a8ca0
 
 Most Linux systems describe latin9 as "iso885915".  OS X
 describes it with "ISO8859-15".  The new regex catches both.
] 

[windows test fix: replace shell script with a Haskell program
Ganesh Sittampalam <ganesh@earth.li>**20121231224332
 Ignore-this: de01ab8647e7d62c18d8c266d514b054
] 

[unsetting DARCS_TEST_PREFS_DIR in utf8 test doesn't seem to be necessary
Ganesh Sittampalam <ganesh@earth.li>**20130101120246
 Ignore-this: ed74710d8b358b920d742e86a7f008d8
 
 It was causing problems on Windows because getAppUserDataDirectory
 still returns the normal user directory.
 
 It also means that the repository type choice isn't picked up.
] 

[improve diagnostics when utf8 test fails
Ganesh Sittampalam <ganesh@earth.li>**20121231224600
 Ignore-this: 63db587bc36f8826c66dc6913a4fdb2d
] 

[need to do case-insensitive comparison on Windows
Ganesh Sittampalam <ganesh@earth.li>**20121228214823
 Ignore-this: ef309d4aef22e87d5c3da3222926af0e
] 

[update NEWS
Ganesh Sittampalam <ganesh@earth.li>**20121216202654
 Ignore-this: aef3e47204a4157504f90554ffc3a327
] 

[conditionally use bytestring-handle instead of haskeline for encoding
Ganesh Sittampalam <ganesh@earth.li>**20121216201745
 Ignore-this: fd758796b689d090d01e003e660e405
 This is transitional because we need to support GHC 6.10: we can switch
 over to bytestring-handle unconditionally on HEAD.
] 

[bump deps for GHC 7.6/latest hackage
Ganesh Sittampalam <ganesh@earth.li>**20121216201543
 Ignore-this: 77829b074a4bff635a421879bdd04be0
] 

[conditionally support tar 0.4
Ganesh Sittampalam <ganesh@earth.li>**20121216201014
 Ignore-this: 8eff0330e6af196727bdd736ef31db25
] 

[recent test-framework seems to require Typeable
Ganesh Sittampalam <ganesh@earth.li>**20121216175601
 Ignore-this: a8cce6b69984bfc2335b5c19688950b3
] 

[stop using Prelude.catch
Ganesh Sittampalam <ganesh@earth.li>**20121216163240
 Ignore-this: b4bfc48775b3337f8f7ebe275be1a058
 
 backported from HEAD
] 

[import constructors of C types to deal with a GHC change
Ganesh Sittampalam <ganesh@earth.li>**20120401132500
 Ignore-this: ab7cf2fb5e9a2494c14fe7394200da9b
] 

[TAG 2.8.3
Ganesh Sittampalam <ganesh@earth.li>**20121104174910
 Ignore-this: 3198f6deecf3d1b44df6e05a2657d9ca
] 

Compiled with:

HTTP-4000.2.6
array-0.4.0.1
base-4.6.0.0
bytestring-0.10.0.0
containers-0.5.0.0
directory-1.2.0.0
extensible-exceptions-0.1.1.4
filepath-1.3.0.1
hashed-storage-0.5.10
haskeline-0.7.0.3
html-1.0.1.2
mmap-0.5.8
mtl-2.1.2
network-2.3.2.0
old-time-1.1.0.1
parsec-3.1.3
process-1.1.0.2
random-1.0.1.1
regex-compat-0.95.1
tar-0.4.0.1
terminfo-0.3.2.5
text-0.11.2.3
unix-2.6.0.0
utf8-string-0.3.7
vector-0.10.0.1
zlib-0.5.4.0
$ 

4. What operating system are you running?

ALT Linux
msg17473 (view) Author: kowey Date: 2014-05-20.13:38:31
Thanks for splitting this off. I'm hoping this is indeed a relatively 
straightforward matter of terminal detection, which I *think* is well-
understood in darcs or Haskell land.
msg17474 (view) Author: imz Date: 2014-05-20.14:08:56
Actually, I wasn't sure that this were really some valid *colorization
control chars*, because then it's not clear where they could come from.

"dumb" terminal shouldn't have any at all if the printing code has some
terminal detection code. So, this must be then some general colorization
chars (if there is no terminal detection), perhaps equal to those that
are used for my usual terminal (rxvt-unicode).

But then "cat" would simply pass them through, wouldn't it? and I would
see them as colors. But I see them as:

_][_

Strange!

(I noticed this because I worked in Emacs eshell, and hence had
TERM=dumb. Then I did the "cat" experiment.)
msg17475 (view) Author: kowey Date: 2014-05-20.14:40:42
Let's use something closer to your original title to be on the safe side. 
(avoid mischaracterisation of the problem)
History
Date User Action Args
2014-05-20 08:24:06imzcreate
2014-05-20 13:38:33koweysetpriority: bug
status: unknown -> needs-diagnosis/design
messages: + msg17473
title: cmd output: wrong colorized escaped Unicode chars are surrounded with garbage on dumb terminals -> unwanted colorization control chars on dumb terminal (Unicode text)
2014-05-20 14:08:58imzsetmessages: + msg17474
2014-05-20 14:40:43koweysetmessages: + msg17475