darcs

Issue 1717 Confusing behaviour when "No newline at end of file"

Title Confusing behaviour when "No newline at end of file"
Priority bug Status deferred
Milestone 3.0.0 Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, ganesh, twb
Assigned To
Topics

Created on 2009-12-14.07:13:58 by twb, last changed 2010-06-15.21:10:45 by admin.

Messages
msg9611 (view) Author: twb Date: 2009-12-14.07:13:55
Darcs appears to assume that "empty" files in fact contain a single ^J
character.  I discovered this when an application created text files
without a tailing newline -- darcs whatnew and record both considered
the file to have had automatic newline REMOVED.

This behaviour is VERY VERY confusing, and it was only by chance that
I managed to diagnose the symptoms.

Below is a demonstration script

    #!/bin/bash -x
    darcs --version
    /usr/bin/darcs --version
    darcs init
    touch f
    darcs rec -lam 'Add f (empty).' f

    cat >f <<EOF
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras posuere
    mattis enim, nec elementum lorem vehicula non. Donec eget eros orci,
    sed feugiat tellus.

    Morbi nisi lectus, malesuada a aliquam sed, tempus id orci. Quisque
    sit amet accumsan risus. Pellentesque ultrices diam ut ligula
    scelerisque in porta libero euismod.
    EOF

    perl -pi -0 -e 's/\n$//' f    # delete the final newline

    darcs whatsnew f
    darcs record f <<<nnnq

    ## Note that the record behaviour is slightly different for 2.3.0.
    /usr/bin/darcs record f <<<nnnq

    darcs rec -am 'Add f (contents).' f
    darcs diff -p .

and its output

    with-temp-dir: entering directory `/tmp/with-temp-dir.tro2Eo'
    + darcs --version
    2.3.1 (+ 381 patches)
    + /usr/bin/darcs --version
    2.3.0 (release)
    + darcs init
    + touch f
    + darcs rec -lam 'Add f (empty).' f
    Recording changes in "f":

    Finished recording patch 'Add f (empty).'
    + cat
    + perl -pi -0 -e 's/\n$//' f
    + darcs whatsnew f
    What's new in "f":

    hunk ./f 1
    +Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras posuere
    +mattis enim, nec elementum lorem vehicula non. Donec eget eros orci,
    +sed feugiat tellus.
    hunk ./f 5
    +Morbi nisi lectus, malesuada a aliquam sed, tempus id orci. Quisque
    +sit amet accumsan risus. Pellentesque ultrices diam ut ligula
    +scelerisque in porta libero euismod.
    + darcs record f
    Recording changes in "f":

    hunk ./f 1
    +Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras posuere
    +mattis enim, nec elementum lorem vehicula non. Donec eget eros orci,
    +sed feugiat tellus.
    Shall I record this change? (1/2)  [ynWesfvplxdaqjk], or ? for help:
    hunk ./f 5
    +Morbi nisi lectus, malesuada a aliquam sed, tempus id orci. Quisque
    +sit amet accumsan risus. Pellentesque ultrices diam ut ligula
    +scelerisque in porta libero euismod.
    Shall I record this change? (2/2)  [ynWesfvplxdaqjk], or ? for help:
    Ok, if you don't want to record anything, that's fine!
    + /usr/bin/darcs record f
    Recording changes in "f":

    hunk ./f 1
    -
    +Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras posuere
    +mattis enim, nec elementum lorem vehicula non. Donec eget eros orci,
    +sed feugiat tellus.
    +
    +Morbi nisi lectus, malesuada a aliquam sed, tempus id orci. Quisque
    +sit amet accumsan risus. Pellentesque ultrices diam ut ligula
    +scelerisque in porta libero euismod.
    Shall I record this change? (1/1)  [ynWsfvplxdaqjk], or ? for help:
    Ok, if you don't want to record anything, that's fine!
    + darcs rec -am 'Add f (contents).' f
    Recording changes in "f":

    Finished recording patch 'Add f (contents).'
    + darcs diff -p .
    Mon Dec 14 18:11:45 EST 2009  Trent W. Buck <trentbuck@gmail.com>
      * Add f (contents).
    diff -rN -u -purd old-with-temp-dir.tro2Eo/f new-with-temp-dir.tro2Eo/f
    --- old-with-temp-dir.tro2Eo/f	2009-12-14 18:11:46.024236048 +1100
    +++ new-with-temp-dir.tro2Eo/f	2009-12-14 18:11:46.024236048 +1100
    @@ -0,0 +1,7 @@
    +Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras posuere
    +mattis enim, nec elementum lorem vehicula non. Donec eget eros orci,
    +sed feugiat tellus.
    +
    +Morbi nisi lectus, malesuada a aliquam sed, tempus id orci. Quisque
    +sit amet accumsan risus. Pellentesque ultrices diam ut ligula
    +scelerisque in porta libero euismod.
    \ No newline at end of file
    with-temp-dir: leaving directory `/tmp/with-temp-dir.tro2Eo'
msg9612 (view) Author: ganesh Date: 2009-12-14.10:28:13
I don't think the assumption is that there's a single ^J - if that were the
case, then it wouldn't report a diff between an empty file and one containing a
single ^J, but it does.

The issue is more that it models every file as a non-empty list of strings, with
the invariant

    file_contents = Data.List.intercalate "\n" file_lines

(where intercalate sticks the separator between each element but not at the end)

So the empty file corresponds to [""], a file with a newline at the end
corresponds to [..., ""], and a file doesn't have an empty string at the end of
the list.

I think this model is necessary. The alternative I first thought of was

    file_contents = unlines file_lines

(where unlines adds a newline after every single line).

But this simply can't represent files without trailing newlines (and indeed,
unlines . lines is not the identity:

    Prelude> unlines (lines "wibble")
    "wibble\n"

So I don't see any obvious alternative to the current behaviour.
msg9625 (view) Author: twb Date: 2009-12-14.12:22:11
Ganesh Sittampalam wrote:
> So I don't see any obvious alternative to the current behaviour.

Can we at least inform the user (as diff does)?  Currently a newly
added file is "inexplicably" recorded as two separate hunks.
msg9628 (view) Author: ganesh Date: 2009-12-14.12:29:41
On Mon, 14 Dec 2009, Trent W. Buck wrote:

>
> Trent W. Buck <trentbuck@gmail.com> added the comment:
>
> Ganesh Sittampalam wrote:
>> So I don't see any obvious alternative to the current behaviour.
>
> Can we at least inform the user (as diff does)?  Currently a newly
> added file is "inexplicably" recorded as two separate hunks.

I'd be happy with that (assuming that the message was only emitted when 
there might be confusion.)

BTW even if we do identify a better behaviour it'd be a patch format 
change to actually implement it.

Ganesh
History
Date User Action Args
2009-12-14 07:13:58twbcreate
2009-12-14 10:28:15ganeshsetnosy: + ganesh
messages: + msg9612
2009-12-14 12:22:14twbsetmessages: + msg9625
2009-12-14 12:29:42ganeshsetmessages: + msg9628
2009-12-15 01:52:13twbsettopic: + Target-3.0
2009-12-18 08:29:24twbsetstatus: unknown -> deferred
2009-12-24 10:27:40koweysetpriority: bug
2010-06-15 21:10:44adminsettopic: - Target-3.0
2010-06-15 21:10:45adminsetmilestone: 3.0.0