Issue 2286 darcs changes: basic usage results in "invalid byte sequence"

Title darcs changes: basic usage results in "invalid byte sequence"
Priority critical Status resolved
Milestone 2.10.0 HEAD Resolved in
Superseder Nosy List ganesh, markstos, owst
Assigned To ganesh

Created on 2012-12-21.18:03:56 by markstos, last changed 2013-01-21.06:49:11 by ganesh.

msg16471 (view) Author: markstos Date: 2012-12-21.18:03:55
1. Summarise the issue (what were doing, what went wrong?)

$ darcs changes --debug-verbose -p 11421 --summary
Beginning identifying repository .
Done identifying repository .
Identified darcs-1 repo: /home/mark/www/trunk
About to read the repository...
Done reading the repository.
About to print the changes...
I'm doing copyFileUsingCache on inventories/0000301602-
I'm doing copyFileUsingCache on inventories/0000648963-
I'm doing copyFileUsingCache on inventories/0003150447-
darcs: ByteString: hGetContents: invalid argument (invalid byte 

2. What behaviour were you expecting instead?

This used to work.

3. What darcs version are you using? (Try: darcs --exact-version)

  $ darcs --version
  2.9.6 (+ 56 patches)

Before I pulled the last ~30 patches, this was not a problem.

4. What operating system are you running?
Ubuntu 12.04 with GHC 7.4.2.
msg16472 (view) Author: markstos Date: 2012-12-21.18:53:38
This was reproduced in the darcs repo, by simply running "darcs 
changes". It seems to be choking on this patch:

Thu Nov 17 01:25:47 EST 2005  Daniel B?nzli <daniel.buenzli@epfl.ch>
  * Fixed documentation of DARCS_GET_HTTP with curl.

Perhaps an uncommon character in "B?nzli" is triggering failure? ( 
msg16473 (view) Author: markstos Date: 2012-12-21.19:01:03
owst says:

'hah! The shell tests don't fail, but "Testing patch metadata decoding:" 
does fail'

Ganesh, owst says you are good with the encoding issues. Thoughts?
msg16474 (view) Author: markstos Date: 2012-12-21.19:20:52
<owst> Interesting, I file thinks the broken inventory in my case is ISO-
<owst> darcs.net/ $ zcat _darcs/inventories/0000005604-
bb001bedd0ab7044fd8e0386b2de3e79eefff69b9fe3776d035a9f2cedb354ad > 
<owst> darcs.net/ $ file busted_inventory.txt 
<owst> busted_inventory.txt: ISO-8859 text
<owst> darcs.net/ $ zcat _darcs/inventories/0000039357-
ec588bcc63ff07b473b17e855d55d0858e7d7ecd9563a0849edaab1f672fc68c > 
<owst> darcs.net/ $ file good_inventory.txt 
<owst> good_inventory.txt: ASCII text
<owst> busted_inventory was the one that contained that patch by Daniel
<owst> anyway, I'm sure that's something to do with it :-)
msg16475 (view) Author: owst Date: 2012-12-21.19:24:00
This can be reproduced for me in the current screened repo: 
> darcs cha -p 'Added a test for changes --context patch selection.'

Sun Nov 20 17:01:57 GMT 2005  darcs: ByteString: hGetContents: invalid
argument (invalid byte sequence)

Strangely, annotate doesn't fail and displays fine:

>  darcs annotate -p 'Added a test for changes --context patch selection.'
[Added a test for changes --context patch selection.
Daniel Bünzli <daniel.buenzli@epfl.ch>**20051120170157] hunk
./tests/changes.pl 45
+### $
+like(`$DARCS changes --context --from-patch="num 1\$" --to-patch="num
+     qr/^\n.*\n\n.*num 4\n.*\n\n.*num 3\n.*\n\n.*num 2\n.*\n\n.*num
+     'changes --context --from-patch="num 1$" --to-patch="num 4$"');

Finding the inventory file containing the problem patch by Daniel:

darcs.net/ $ zcat
> busted_inventory.txt
darcs.net/ $ file busted_inventory.txt 
busted_inventory.txt: ISO-8859 text

but taking the inventory mentioned in pristine.hashed:

darcs.net/ $ zcat
> good_inventory.txt
darcs.net/ $ file good_inventory.txt 
good_inventory.txt: ASCII text

That must be something to do with it!
msg16476 (view) Author: markstos Date: 2012-12-21.19:29:19
I confirmed that removing this patch removes this issue (so, I think 
this patch must have introduced the issue):

  use bytestring-handle instead of haskeline for encoding

Let's roll it back soon if there's not quick fix that allows us to use 
bytestring-handle without triggering this bug.
msg16477 (view) Author: ganesh Date: 2012-12-21.21:21:05
Definitely mine to fix. I didn't pay enough attention to the statement 
"Unrecognized byte sequences in the input are ignored" in the code I was 
msg16484 (view) Author: markstos Date: 2012-12-26.14:40:38
Thanks for looking at this Ganesh. 

I recommend immediately issuing a "rollback" patch or oblit'ing the bad 
patch from darcs-screened, as it would leave a rather bad impression for 
anyone trying "darcs-screened" in the meantime, expecting it to be 
moderately usable.
msg16489 (view) Author: ganesh Date: 2012-12-28.20:26:00
I've obliterated it from screened for now. I'll put it back once I figure 
out an adequate fix to go on top of it.
msg16547 (view) Author: ganesh Date: 2013-01-21.06:49:10
This should now be fixed properly in HEAD (screened) and on the 2.8 
branch. See patch1013.
Date User Action Args
2012-12-21 18:03:56markstoscreate
2012-12-21 18:53:42markstossetmessages: + msg16472
2012-12-21 19:01:04markstossetnosy: + ganesh
messages: + msg16473
2012-12-21 19:20:54markstossetmessages: + msg16474
2012-12-21 19:24:02owstsetmessages: + msg16475
2012-12-21 19:29:21markstossetmessages: + msg16476
2012-12-21 19:29:35markstossetnosy: + owst
2012-12-21 21:21:08ganeshsetpriority: critical
status: unknown -> needs-diagnosis/design
milestone: 2.10.0 HEAD
messages: + msg16477
assignedto: ganesh
2012-12-26 14:40:39markstossetmessages: + msg16484
2012-12-28 20:26:01ganeshsetmessages: + msg16489
2013-01-21 06:49:11ganeshsetstatus: needs-diagnosis/design -> resolved
messages: + msg16547