darcs

Patch 1680 five Bytestring-related patches from Ben's branch

Title five Bytestring-related patches from Ben's branch
Superseder Nosy List bf, gh
Related Issues
Status accepted Assigned To
Milestone

Created on 2018-03-29.18:57:18 by gh, last changed 2018-03-31.21:35:16 by bf.

Files
File name Status Uploaded Type Edit Remove
fix-writing-of-hex-strings-in-darcs_util_bytestring.dpatch gh, 2018-03-30.18:38:35 application/octet-stream
more-efficient-definition-of-unlinesps.dpatch gh, 2018-03-29.18:57:17 application/octet-stream
rollback-definition-of-lineps-and-use-singleton-where-appropriate.dpatch gh, 2018-03-29.22:47:49 application/octet-stream
See mailing list archives for discussion on individual patches.
Messages
msg20085 (view) Author: gh Date: 2018-03-29.18:57:17
Pulled from Ben's branch [1] https://hub.darcs.net/bf/darcs-bf-latest,
the patches are:



5 patches for repository http://darcs.net:

patch 89f8ff164b855cb9b350f243f3289fa6cc37cc15
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Thu Feb 22 16:14:56 -03 2018
  * more efficient definition of unlinesPS

patch 33fba47adbf043d882560351297e90ca73fa2a0d
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Feb 22 06:58:00 -03 2018
  * refactor, simplify, and document hunk application

  This refactor does two things: First, it decouples the optimization that
  applies sequences of hunks to the same file in one go from the algorithm
  that applies a single hunk. Second, single hunk application is somewhat
  simplified and extensively documented. This is very subtle and highly
  optimized code that easily breaks.

  This also moves breakAfter/BeforeNthNewline to D.P.Prim.V1.Commute. These
  functions are only used in this one place. They are undocumented and have
  subtle semantics, which makes them unsuitable for re-use.

patch b414441536de216a6f64bca526aa23f29d7d3d0e
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Wed Mar 28 12:47:39 -03 2018
  * re-format import list of Darcs.Util.ByteString

patch 082276dea147999b8b775e82596a4191345388b0
Author: Ben Franksen <ben.franksen@online.de>
Date:   Wed Mar 28 12:56:56 -03 2018
  * fix betweenLinesPS and add a unit test

  This is part of the program to get rid of using ByteString internals.
  The previous version of betweenLinesPS was buggy because it relied on
  implementation details of unlinesPS, see issue2573.

patch 48318938217fde44f04a66edd1bd1e91423178bb
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Wed Mar 28 16:07:51 -03 2018
  * throw out all access to bytestring internals from Darcs.Util.ByteString

  The only remaining function, unsafeWithInternals, is used only in
  Darcs.Util.Hash so it has been moved into that module. The two C files
  src/fpstring.[ch] have been removed.


[1] https://lists.osuosl.org/pipermail/darcs-users/2018-March/027312.html
Attachments
msg20087 (view) Author: gh Date: 2018-03-29.20:31:16
First, a reference before reading these patches is
http://bugs.darcs.net/issue2573  

The review:
  
# more efficient definition of unlinesPS

Basically this:

-unlinesPS [] = BC.empty
-unlinesPS x  = BC.init $ BC.unlines x
+unlinesPS [] = B.empty
+unlinesPS x  = B.concat $ intersperse (BC.pack "\n") x

Looks correct, I guess this is more efficient becauses it does not
append then remove a last "\n".

Why not use (BC.singleton '\n') instead of (BC.pack "\n")?

# refactor, simplify, and document hunk application

This is a heavy one. The idea of breakAfter/BeforeNthNewline to
D.P.Prim.V1.Commute makes sense. That second function's type is
changed (along with the type of other functions of Commute).

applyHunkLines now returns an Either String FileContents instead of
a Maybe FileContents, to improve error reporting.

Also applyHunkLines applied a list of hunks to some FileContents, each
hunk being a triploe (offset,to_remove, to_insert). applyHunkLines was
therefore recursive

Now the new version of this function applies a single hunk. This makes
the code simpler to follow. The caller to applyHunkLines (in both old
and new
versions) is applyHunk(s).

But in fact applyHunkLines was always called with a list of one hunk as
argument (in hunkmod). So this simplification of the code is welcome.

Also this helps simplify the instance Apply Prim for the case of applying
a hunk.

As for correctness, I am just going to trust the quickcheck tests,
which pass.

OK then.


# re-format import list of Darcs.Util.ByteString

Just indenting and moving stuff around.

OK.

# fix betweenLinesPS and add a unit test

Reference:
https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/Data-ByteString.html#v:breakSubstring

Reimplement betweenLinesPS in an efficient way. Also implement a simpler
and less efficient
spec_betweenLinesPS that is used to define a new QuickCheck property in
the test suite.
betweenLinesPS no longer uses linesPS and list-based functions, while
spec_betweenLinesPS
still does. A call to ByteString.Internals.toForeignPtr is removed.

Again, I'm going to trust QuickCheck on that one.

OK.
  
# throw out all access to bytestring internals from Darcs.Util.ByteString

Functions like isFunky, hashPS, fromPS2Hex, fromHex2PS are now defined
without C functions , so we can drop fpstring.c, yay!

We still have one use of unsafeWithInternals in Darcs.Util.Hash (to
calculate
SHA1 hashes). This will go away when we switch to cryptonite.

One question. linesPS definition is changed from:

~~~
linesPS ps
     | B.null ps = [B.empty]
     | otherwise = BC.split '\n' ps
~~~

To:

~~~
linesPS ps =
  case search ps of
    Nothing -> [ps]
    Just n -> B.take n ps : linesPS (B.drop (n + 1) ps)
  where
    search = BC.elemIndex '\n'
~~~

Is the reason of this change to avoid the use of BC.split/B.split:

https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/src/Data.ByteString.html#split

whose definition contains a call to  accursedUnutterablePerformIO, a "bad"
function, judging from the mail thread mentioned in 
http://bugs.darcs.net/issue2573 ?

Then how do we know that B.take, B.drop, BC.elemIndex will never be
implemented in terms
of this bad function?
msg20089 (view) Author: bf Date: 2018-03-29.22:23:04
> Why not use (BC.singleton '\n') instead of (BC.pack "\n")?

No particular reason I can think of.

> # refactor, simplify, and document hunk application
> This is a heavy one.

Indeed. The code is subtle I needed quite a while to understand it. It
is still not as clear as I would like it to be.

One of the conclusions I took away from this refactor is that FileUUID
does the right thing here: its hunks aren't line based any more, a hunk
takes out a slice of the content and replaces it with another one. This
means apply is trivial and the complexities of how we interpret the data
as "lines" is moved out of the core and into the user interface.

> linesPS definition is changed...

I must say I can't remember why I was thinking this is an improvement.
It looks like a mistake to me now. The previous definition is more
concise and I see no reason why it should be less efficient. Please roll
it back. And thanks for spotting it.

I was probably confusing this with the change to unlinesPS which really
is an improvement (for the reason you guessed).

This has nothing to do with a18PerformIO. I think we have no choice but
to trust the regular ByteString API is safe to use, whatever dirty
tricks the implementation might use. But the Internal API is really
extremely dangerous (indeed much more so than C programming since you
have to take things like the GC and lazy evaluation into account).
msg20091 (view) Author: gh Date: 2018-03-29.22:47:49
Alright, so here is the follow-up patch, and I'm accepting the whole bundle.

1 patch for repository http://darcs.net:

patch 37f1e53411ed8bbe7577f757391ec5735d4ee938
Author: Guillaume Hoffmann <guillaumh@gmail.com>
Date:   Thu Mar 29 19:46:05 -03 2018
  * rollback definition of linePS and use singleton where appropriate
Attachments
msg20092 (view) Author: gh Date: 2018-03-30.18:38:35
Here is an urgent followup patch that I am going to self accept to avoid
involuntary repository format changes elsewhere.

The previous bundle made binary primitive patches to be written as
hexadecimal with uppercase numbers (ABCDEF) instead of lowercase numbers
(abcdef). The current version of Darcs has no problem reading these
primitive patches again, but when pushing a named patch with such hex
strings to another server with an older Darcs, this triggers a failure.
That happened to me this morning while pushing the SHA1 patch to darcs.net.

Writing a test for such property (being compatible with older darcsen)
would require some work. In fact we might have to download darcs
binaries from some location and run them on the machine where the tests
occur.

1 patch for repository http://darcs.net:

patch 863bf570e9f5e232bc5b90cad9fc735da747c55f
Author: Guillaume Hoffmann <guillaumh@gmail.com>
Date:   Fri Mar 30 14:40:47 -03 2018
  * fix writing of hex strings in Darcs.Util.ByteString
  Numbers a to f should we written lowercase, otherwise internal
  patch files and bundles become incompatible with older darcsen.
Attachments
msg20093 (view) Author: bf Date: 2018-03-31.21:35:16
Oh, good one. I should have checked that the output is lower case.
History
Date User Action Args
2018-03-29 18:57:18ghcreate
2018-03-29 20:31:17ghsetnosy: + bf
messages: + msg20087
2018-03-29 22:23:04bfsetmessages: + msg20089
2018-03-29 22:47:49ghsetstatus: needs-review -> accepted
files: + rollback-definition-of-lineps-and-use-singleton-where-appropriate.dpatch
messages: + msg20091
2018-03-30 18:38:35ghsetfiles: + fix-writing-of-hex-strings-in-darcs_util_bytestring.dpatch
messages: + msg20092
2018-03-31 21:35:16bfsetmessages: + msg20093