Issue 1556: task: abandon tentative files and keep the information in memory

Title	task: abandon tentative files and keep the information in memory
Priority	feature	Status	needs-implementation
Milestone		Resolved in
Superseder		Nosy List	darcs-devel, ganesh, kowey
Assigned To		Topics

Created on 2009-08-23.17:57:59 by kowey, last changed 2020-07-31.21:59:10 by bfrk.

Messages
msg8413 (view)	Author: kowey	Date: 2009-08-23.17:57:57
This is from David Roundy's msg5797 on issue992: Anyhow, this should be combined with a safety refactor which would ensure that the _darcs/hashed_inventory is only read once: we should store its contents in the Repository data structure, so we can't accidentally mix two views of a remote repository during one command. I don't think we currently make this mistake, but it's troubling that we could. David goes on to comment on how this would fit into issue992: Once this refactor is done (which means that we'd read _darcs/hashed_inventory when first identifying the Repository), we can easily make darcs read _darcs/inventories/xxx instead, if the URL has some fancy format that includes a hash value. Or if a file with that hash isn't present in _darcs/inventories/ we'd look at _darcs/hashed_inventory to see if that has the right hash. This feature will enable self-authenticating URLs, albeit URLs that only describe a specific version.
msg12472 (view)	Author: kowey	Date: 2010-09-06.11:47:30
Petr says this is already done as part of his adventure refactor: http://irclog.perlgeek.de/darcs/2010-09-06#i_2790255
msg17406 (view)	Author: gh	Date: 2014-04-28.19:46:57
I think this was never ported from adventure to HEAD, so marking it as "needs-implementation" again.
msg18751 (view)	Author: gh	Date: 2015-09-22.14:36:46
I would extend the scope of the proposal, following Petr's observations: > But the actual motivation for that was getting rid of the tentative files, which are superfluous and, to some extent, dangerous. > You don't need to dump the intermediate states to disk, really. > And they are dangerous because the API allows direct access to both non-tentative and tentative stuff. So the scope would be: * always read hashed_inventory once * never write tentative_hashed_inventory, tentative_pristine and maybe pending.tentative, instead keep them in memory. That would reduce Darcs' filesystem IO footprint, which is welcome especially in cases like repositories in sshfs or dropbox.
msg20215 (view)	Author: bfrk	Date: 2018-07-19.11:23:02
Abandoning the tentative files and instead keeping them in memory sounds like a viable alternative to my idea of tracking the transaction state in a type witness. However, must make sure that we do not accidentally keep the whole repo in memory, only the head inventory. Another obvious idea is to abandon hashed_inventory (including the tentative version). Instead, always store the head inventory hashed and keep only its hash in a file under _darcs. It may be difficult to make this change in a backward compatible way, though.
msg20216 (view)	Author: bfrk	Date: 2018-07-21.15:06:32
Looking at the code for writeTentativeInventory reveals that we already store the _darcs/hashed_inventory (minus the pristine hash) inside _darcs/inventories in standard hashed format. Thus we could compress _darcs/hashed_inventory to a file that contains only the pristine and inventory hash as e.g. _darcs/current_state. Reading this file, caching the two hashes in the Repository token, and updating it whenever the repo is modified would be cheap and could be done for each command. We should be careful to write the parser for this file in a way that allows future extensions of the format, e.g. for when we add versions or branches. We must continue to write hashed_inventory, though, to remain compatible with previous versions of Darcs. So here is my revised plan: * always read current_state and pending once * never write tentative_hashed_inventory, tentative_pristine, and pending.tentative, instead keep them in memory as members of Repository * on finalization, write hashed_inventory, pending, and current_state each atomically by first writing to a temporary name and then renaming (like we already do for pending)
msg20217 (view)	Author: bfrk	Date: 2018-07-21.15:14:49
This move could be accompanied by adding a Repository witness for the pending state (wP).
msg20225 (view)	Author: bfrk	Date: 2018-07-23.08:48:53
I am currently persuing the idea of making this change in a way that is compatible with and prepares for future extensions regarding in-repo branches. The idea is to condense hashed_inventory and patches/pending into small files that contain three hashes: the inventory hash, the pristine hash, and the pending hash. This requires the current head inventory to be hashed but apparently we already do that (see writeTentativeInventory). We also need to hash pending, which we do not do yet. These branch files live under a new directory _darcs/branches. The first step adds only a single branch named "current". We read that file once when we identify a repo, falling back to reading the old special files if it does not exist, and creating the branch file if this is our local repository. The Repository type is extended to contain the branch data, consisting of the three hashes and the branch name. On finalization we atomically write the branch back to disk (but we need to maintain the old special files, too, for compatibility). If we take care to make the format extensible (in a forward and backward compatible way) we can add more hashes later, e.g. for the unrevert bundle or the rebase patch, once these are converted to a hashable format or uncoupled from the normal patches, respectively. I much prefer this refactor over the idea of tracking transaction mode in the types.
msg20226 (view)	Author: bfrk	Date: 2018-07-23.13:27:25
Okay, if this is a preparation for branches, then I want to get things right. I made a conceptual mistake when I proposed a _darcs/branches/current file. So a named branch is a file under _darcs/branches named like the branch. We need exactly one branch to be active ("checked out" in git terms). How do we represent that? A simple solution is to maintain a copy of the active branch file as _darcs/active_branch. We could also store just the branch name in _darcs/active_branch (akin to a symbolic link); but that doesn't fit well with the idea of introducing branches step-by-step: we'd have the conjure up a name for the single branch in a traditional single-branch-repo. So, for the first step we don't actually need the _darcs/branches directory, just a single file _darcs/active_branch or perhaps just _darcs/active.
msg20227 (view)	Author: bfrk	Date: 2018-07-25.11:55:42
The more I hack on this the more I like it. This refactor turns the Repository API into something much more functional in style than it used to be. While most of the functions still return IO actions (because we have to read and write the hashed store), we largely avoid /stateful/ IO: what we read and write is determined by the hashes, so we don't actually modify state in a semantic sense... except right at the end when we finalize the branch data for comsumption by subsequent darcs commands. The whole "tentative" vs. "recorded" distinction simply disappears.

History
Date	User	Action	Args
2009-08-23 17:57:59	kowey	create
2009-08-23 18:00:41	kowey	link	issue992 superseder
2009-08-25 18:16:07	admin	set	nosy: + darcs-devel, - simon
2009-08-27 14:30:49	admin	set	nosy: kowey, darcs-devel, thorkilnaur, dmitry.kurochkin
2009-09-06 21:04:02	kowey	set	topic: + Hashed nosy: kowey, darcs-devel, thorkilnaur, dmitry.kurochkin
2010-04-03 12:06:40	kowey	set	topic: + Library
2010-09-06 11:47:31	kowey	set	status: needs-implementation -> has-patch assignedto: mornfall messages: + msg12472 nosy: + mornfall
2014-04-28 19:46:58	gh	set	status: has-patch -> needs-implementation messages: + msg17406
2015-09-22 14:36:48	gh	set	topic: - Hashed, Library nosy: + ganesh, - thorkilnaur, dmitry.kurochkin, mornfall assignedto: mornfall -> messages: + msg18751 milestone: 2.12.0
2018-07-19 11:23:04	bfrk	set	messages: + msg20215
2018-07-21 15:06:33	bfrk	set	messages: + msg20216 title: task: safety refactor to ensure that hashed_inventory is only read once -> task: abandon tentative files and keep the information in memory
2018-07-21 15:14:50	bfrk	set	messages: + msg20217
2018-07-23 08:48:54	bfrk	set	messages: + msg20225
2018-07-23 13:27:26	bfrk	set	messages: + msg20226
2018-07-23 13:28:21	bfrk	set	milestone: 2.12.0 -> 2.14.2
2018-07-25 11:55:44	bfrk	set	messages: + msg20227
2020-07-31 21:59:10	bfrk	set	milestone: 2.14.2 ->

Issue 1556 task: abandon tentative files and keep the information in memory