Patch 2216 use ShortByteString for Hash content (and 16 more)

Title use ShortByteString for Hash content (and 16 more)
Superseder Nosy List bf
Related Issues
Status needs-review Assigned To

Created on 2021-11-04.23:32:13 by bf, last changed 2021-11-05.07:54:09 by bf.

File name Status Uploaded Type Edit Remove
patch-preview.txt bf, 2021-11-04.23:32:12 text/x-darcs-patch
use-shortbytestring-for-hash-content.dpatch bf, 2021-11-04.23:32:12 application/x-darcs-patch
See mailing list archives for discussion on individual patches.
msg22919 (view) Author: bf Date: 2021-11-04.23:32:12
Refactors related to patch/inventory/pristine hashes.

17 patches for repository http://darcs.net/screened:

patch 7984dfa59af4ab318bbc1208bd54baa4e290b750
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Mar  7 17:25:25 CET 2021
  * use ShortByteString for Hash content

  This should bring down memory use and decrease fragmentation.

patch d56b7945d20bd2e29a0a2131d255394c17a63d03
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Mar  7 11:14:10 CET 2021
  * cache: refactor cacheHash and remove export

patch 091c755d6bde86d0e0c1dd7b0a60aacb9fc3fab2
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Mar  8 10:36:08 CET 2021
  * avoid re-validation of already validated patch hashes

patch 64ebee50e76201dc651da2b632a905d89f0db193
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Mar 14 09:34:56 CET 2021
  * validate hashes in inventories on the ByteString side

  For reasons I haven't been able to figure out this drastically reduces
  memory consumption.

patch fbd443da22fdb6ec3012b95f2ba70059f2af5007
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Mar 27 08:53:24 CET 2021
  * move hash validation from D.R.Inventory to D.Util.Cache

patch 1a649931c95cf11bde605b324738cf64fe01602f
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Apr 12 07:54:17 CEST 2021
  * make possible non-existence of hashes explicit

  This removes the NoHash constructor and (more or less mechanically) replaces
  Hash with Maybe Hash. This exposes lots of situations where we missed out on
  more precise typing i.e. where we know we have a hash but still work with a
  Maybe Hash. This patch doesn't clean these up, it just allows and encourages
  us to do so.

patch aa120e78415d2f17b0df30b196ea7d6450f0a46c
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Apr 16 11:53:52 CEST 2021
  * use decoding to validate hashes

  This also delegates the implementation of okayHash to okayHashB.

patch 08d3835579f633bf1b03250a00a54b9c79f91cbb
Author: Ben Franksen <ben.franksen@online.de>
Date:   Wed Mar 24 03:32:16 CET 2021
  * fix generator for hashes in D.T.R.Inventory

patch a5e7aac823fbb0e216473d93e714dfc9b477e428
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Apr 17 12:16:23 CEST 2021
  * cleanup parsing and unparsing of hashed directories

  The parser now uses Darcs.Util.Parser. The function decodeWhiteName which is
  used by the parser is now explicit about decoding errors. In contrast, the
  unparsing is actually *not* supposed to fail: it has existing hashes for the
  subitems as a precondition; indeed we call hash update functions in various
  places before calling darcsFormatDir. As it stands this is quite brittle and
  should be improved but this has to wait for another patch.

patch 703d61283b18b26fe7b1a72a123180706a625608
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Mar  7 11:13:27 CET 2021
  * cache: inline copyFilesUsingCache

patch cc644a31e458ba99c533b789ecf3e50664e16712
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Mar 27 06:48:14 CET 2021
  * move D.R.Inventory to D.R.Inventory.Format

  This is so that we can move the code conerned with reading and writing
  inventories to D.R.Inventory w/o mixing the inventory format with its

patch 0d657b2f0ad029650cf514d3d62af20c835082c5
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Apr 16 13:24:03 CEST 2021
  * major refactor: internally store valid hashes in parsed form

  The main theme here is the "parse, don't validate" mantra. Storing hashes
  (optionally including the content size) in parsed form is memory efficient
  and gives us much better typing. The code for this has been moved into its
  own module Darcs.Util.ValidHash with a safe API. PatchInfoAnd now stores its
  hash as a PatchHash and the Tagged sections of a PatchSet store their hash
  as an InventoryHash. The HashedDir is now inferred from the type of the
  hash, which means we no longer have to pass it to the function exported by
  Darcs.Util.Cache, which simplifies the API and makes it more type safe (yet
  note that not a single 'forall' had to be added to type signatures).

  This refactor exposed a strange HashedDir mismatch in the handling of packs
  that I temporarily marked as FIXME. I suspect that some files are not placed
  in the right directories, resulting in a loss of efficiency when cloning
  packed repos. This needs further investigation.

patch b45e0e2afe32fc6ea1fc9933b9fb5ee552a14c61
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jul 16 17:06:29 CEST 2021
  * fix a FIXME: HashedDir mismatch in the handling of packs

  It turned out that the HashedDir parameter wasn't used anywhere, so this
  patch removes it. This makes sense since filepaths in the pack files already
  contain the subdirectory of all files.

patch 7a591395b0b66b6754ee145489772bbedf5c09d2
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Mar 27 07:18:05 CET 2021
  * move reading and writing of inventories from D.R.Hashed to D.R.Inventory

  This is a pure code move, except that I also cleaned up the layout of the
  import lists in D.R.Hashed.

patch 057ac977b25543eb7bfaf0e86b2965cd43e553f1
Author: Ben Franksen <ben.franksen@online.de>
Date:   Wed Apr 21 10:31:22 CEST 2021
  * refactor updateHashes in TreeMonad

  The central change here is in the type of updateHash from

    TreeItem m -> m (Maybe Hash)


    Maybe (TreeItem m -> m Hash).

  Indeed, for a concrete instantiation we either have a total hash function or
  else no hash function at all. In the latter case this change avoids calling
  a procedure that always returns Nothing, potentially recursively over a
  large tree. It also gives us more precise typing.

  Everything else follows from that, with the exception of 'flushItem' which
  was rewritten to make it clearer what happens: we first update the hash,
  then update the item on disk and then replace the item in the tree we are

patch 120ad503225a42ef147135dee5ef16877a2c4035
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun May 30 10:35:42 CEST 2021
  * cache: eliminate hashedFilePathReadOnly

  Since we pass the CacheLoc, we can make the distinction between bucketed
  (writable) and non-bucketed (not writable) cache inside hashedFilePath.

patch 1fc26cf24773f455adb4776ff2525b59e56f4456
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jul 18 16:15:37 CEST 2021
  * fix in Darcs.Patch.PatchInfoAnd.fmapH: must invalidate hash

  The function we map over the contained patch may modify it, so any hash
  we may have had is now invalid.
Date User Action Args
2021-11-04 23:32:13bfcreate
2021-11-05 07:54:09bfsetstatus: needs-screening -> needs-review