Patch 1680: five Bytestring-related patches from Ben's branch - Darcs bug tracker

Title	five Bytestring-related patches from Ben's branch
Superseder		Nosy List	bfrk, gh
Related Issues
Status	accepted	Assigned To
Milestone

Created on 2018-03-29.18:57:18 by gh, last changed 2018-03-31.21:35:16 by bfrk.

Files
File name	Status	Uploaded	Type	Edit	Remove
fix-writing-of-hex-strings-in-darcs_util_bytestring.dpatch		gh, 2018-03-30.18:38:35	application/octet-stream
more-efficient-definition-of-unlinesps.dpatch		gh, 2018-03-29.18:57:17	application/octet-stream
rollback-definition-of-lineps-and-use-singleton-where-appropriate.dpatch		gh, 2018-03-29.22:47:49	application/octet-stream

See mailing list archives for discussion on individual patches.

Messages
msg20085 (view)	Author: gh	Date: 2018-03-29.18:57:17
Pulled from Ben's branch [1] https://hub.darcs.net/bf/darcs-bf-latest, the patches are: 5 patches for repository http://darcs.net: patch 89f8ff164b855cb9b350f243f3289fa6cc37cc15 Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de> Date: Thu Feb 22 16:14:56 -03 2018 * more efficient definition of unlinesPS patch 33fba47adbf043d882560351297e90ca73fa2a0d Author: Ben Franksen <ben.franksen@online.de> Date: Thu Feb 22 06:58:00 -03 2018 * refactor, simplify, and document hunk application This refactor does two things: First, it decouples the optimization that applies sequences of hunks to the same file in one go from the algorithm that applies a single hunk. Second, single hunk application is somewhat simplified and extensively documented. This is very subtle and highly optimized code that easily breaks. This also moves breakAfter/BeforeNthNewline to D.P.Prim.V1.Commute. These functions are only used in this one place. They are undocumented and have subtle semantics, which makes them unsuitable for re-use. patch b414441536de216a6f64bca526aa23f29d7d3d0e Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de> Date: Wed Mar 28 12:47:39 -03 2018 * re-format import list of Darcs.Util.ByteString patch 082276dea147999b8b775e82596a4191345388b0 Author: Ben Franksen <ben.franksen@online.de> Date: Wed Mar 28 12:56:56 -03 2018 * fix betweenLinesPS and add a unit test This is part of the program to get rid of using ByteString internals. The previous version of betweenLinesPS was buggy because it relied on implementation details of unlinesPS, see issue2573. patch 48318938217fde44f04a66edd1bd1e91423178bb Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de> Date: Wed Mar 28 16:07:51 -03 2018 * throw out all access to bytestring internals from Darcs.Util.ByteString The only remaining function, unsafeWithInternals, is used only in Darcs.Util.Hash so it has been moved into that module. The two C files src/fpstring.[ch] have been removed. [1] https://lists.osuosl.org/pipermail/darcs-users/2018-March/027312.html Attachments more-efficient-definition-of-unlinesps.dpatch
msg20087 (view)	Author: gh	Date: 2018-03-29.20:31:16
First, a reference before reading these patches is http://bugs.darcs.net/issue2573 The review: # more efficient definition of unlinesPS Basically this: -unlinesPS [] = BC.empty -unlinesPS x = BC.init $ BC.unlines x +unlinesPS [] = B.empty +unlinesPS x = B.concat $ intersperse (BC.pack "\n") x Looks correct, I guess this is more efficient becauses it does not append then remove a last "\n". Why not use (BC.singleton '\n') instead of (BC.pack "\n")? # refactor, simplify, and document hunk application This is a heavy one. The idea of breakAfter/BeforeNthNewline to D.P.Prim.V1.Commute makes sense. That second function's type is changed (along with the type of other functions of Commute). applyHunkLines now returns an Either String FileContents instead of a Maybe FileContents, to improve error reporting. Also applyHunkLines applied a list of hunks to some FileContents, each hunk being a triploe (offset,to_remove, to_insert). applyHunkLines was therefore recursive Now the new version of this function applies a single hunk. This makes the code simpler to follow. The caller to applyHunkLines (in both old and new versions) is applyHunk(s). But in fact applyHunkLines was always called with a list of one hunk as argument (in hunkmod). So this simplification of the code is welcome. Also this helps simplify the instance Apply Prim for the case of applying a hunk. As for correctness, I am just going to trust the quickcheck tests, which pass. OK then. # re-format import list of Darcs.Util.ByteString Just indenting and moving stuff around. OK. # fix betweenLinesPS and add a unit test Reference: https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/Data-ByteString.html#v:breakSubstring Reimplement betweenLinesPS in an efficient way. Also implement a simpler and less efficient spec_betweenLinesPS that is used to define a new QuickCheck property in the test suite. betweenLinesPS no longer uses linesPS and list-based functions, while spec_betweenLinesPS still does. A call to ByteString.Internals.toForeignPtr is removed. Again, I'm going to trust QuickCheck on that one. OK. # throw out all access to bytestring internals from Darcs.Util.ByteString Functions like isFunky, hashPS, fromPS2Hex, fromHex2PS are now defined without C functions , so we can drop fpstring.c, yay! We still have one use of unsafeWithInternals in Darcs.Util.Hash (to calculate SHA1 hashes). This will go away when we switch to cryptonite. One question. linesPS definition is changed from: ~~~ linesPS ps \| B.null ps = [B.empty] \| otherwise = BC.split '\n' ps ~~~ To: ~~~ linesPS ps = case search ps of Nothing -> [ps] Just n -> B.take n ps : linesPS (B.drop (n + 1) ps) where search = BC.elemIndex '\n' ~~~ Is the reason of this change to avoid the use of BC.split/B.split: https://hackage.haskell.org/package/bytestring-0.10.8.2/docs/src/Data.ByteString.html#split whose definition contains a call to accursedUnutterablePerformIO, a "bad" function, judging from the mail thread mentioned in http://bugs.darcs.net/issue2573 ? Then how do we know that B.take, B.drop, BC.elemIndex will never be implemented in terms of this bad function?
msg20089 (view)	Author: bfrk	Date: 2018-03-29.22:23:04
> Why not use (BC.singleton '\n') instead of (BC.pack "\n")? No particular reason I can think of. > # refactor, simplify, and document hunk application > This is a heavy one. Indeed. The code is subtle I needed quite a while to understand it. It is still not as clear as I would like it to be. One of the conclusions I took away from this refactor is that FileUUID does the right thing here: its hunks aren't line based any more, a hunk takes out a slice of the content and replaces it with another one. This means apply is trivial and the complexities of how we interpret the data as "lines" is moved out of the core and into the user interface. > linesPS definition is changed... I must say I can't remember why I was thinking this is an improvement. It looks like a mistake to me now. The previous definition is more concise and I see no reason why it should be less efficient. Please roll it back. And thanks for spotting it. I was probably confusing this with the change to unlinesPS which really is an improvement (for the reason you guessed). This has nothing to do with a18PerformIO. I think we have no choice but to trust the regular ByteString API is safe to use, whatever dirty tricks the implementation might use. But the Internal API is really extremely dangerous (indeed much more so than C programming since you have to take things like the GC and lazy evaluation into account).
msg20091 (view)	Author: gh	Date: 2018-03-29.22:47:49
Alright, so here is the follow-up patch, and I'm accepting the whole bundle. 1 patch for repository http://darcs.net: patch 37f1e53411ed8bbe7577f757391ec5735d4ee938 Author: Guillaume Hoffmann <guillaumh@gmail.com> Date: Thu Mar 29 19:46:05 -03 2018 * rollback definition of linePS and use singleton where appropriate Attachments rollback-definition-of-lineps-and-use-singleton-where-appropriate.dpatch
msg20092 (view)	Author: gh	Date: 2018-03-30.18:38:35
Here is an urgent followup patch that I am going to self accept to avoid involuntary repository format changes elsewhere. The previous bundle made binary primitive patches to be written as hexadecimal with uppercase numbers (ABCDEF) instead of lowercase numbers (abcdef). The current version of Darcs has no problem reading these primitive patches again, but when pushing a named patch with such hex strings to another server with an older Darcs, this triggers a failure. That happened to me this morning while pushing the SHA1 patch to darcs.net. Writing a test for such property (being compatible with older darcsen) would require some work. In fact we might have to download darcs binaries from some location and run them on the machine where the tests occur. 1 patch for repository http://darcs.net: patch 863bf570e9f5e232bc5b90cad9fc735da747c55f Author: Guillaume Hoffmann <guillaumh@gmail.com> Date: Fri Mar 30 14:40:47 -03 2018 * fix writing of hex strings in Darcs.Util.ByteString Numbers a to f should we written lowercase, otherwise internal patch files and bundles become incompatible with older darcsen. Attachments fix-writing-of-hex-strings-in-darcs_util_bytestring.dpatch
msg20093 (view)	Author: bfrk	Date: 2018-03-31.21:35:16
Oh, good one. I should have checked that the output is lower case.

History
Date	User	Action	Args
2018-03-29 18:57:18	gh	create
2018-03-29 20:31:17	gh	set	nosy: + bfrk messages: + msg20087
2018-03-29 22:23:04	bfrk	set	messages: + msg20089
2018-03-29 22:47:49	gh	set	status: needs-review -> accepted files: + rollback-definition-of-lineps-and-use-singleton-where-appropriate.dpatch messages: + msg20091
2018-03-30 18:38:35	gh	set	files: + fix-writing-of-hex-strings-in-darcs_util_bytestring.dpatch messages: + msg20092
2018-03-31 21:35:16	bfrk	set	messages: + msg20093