darcs

Issue 2318 hashed_inventory is vulnerable to unguided search and replace

Title hashed_inventory is vulnerable to unguided search and replace
Priority bug Status unknown
Milestone Resolved in
Superseder Nosy List MaicoLeberle, ganesh, mdiaz
Assigned To
Topics

Created on 2013-04-22.21:24:31 by ganesh, last changed 2020-08-10.10:12:21 by bf.

Messages
msg16771 (view) Author: ganesh Date: 2013-04-22.21:24:29
A user on IRC reported repository problems, manifesting as 

Failed to commute common patches
bug at src/Darcs/Patch/Depends.hs:249

The user had two repos and pulling between them was reliably causing 
this error. darcs check in both repos was fine.

After some investigating by the user, it turned out that 
_darcs/hashed_inventory in one repo had been affected by a general 
search and replace.

We ought to have integrity checking and/or compression on this file to 
avoid this problem.
msg22380 (view) Author: bf Date: 2020-08-02.09:05:59
We should replace _darcs/hashed_inventory with a small file that 
contains only the pristine hash and the current inventory hash. Let 
me call that file _darcs/head, for reference, the actual name is up 
for bikeshedding.

The format of this file should be such that we can add more hashes to 
it without breaking compatibility. For instance, a sequence of lines 
of the form "keyword:hash" with no spaces, keyword matches [a-z_]+, 
hash matches [0-9a-f]+. The internal representation retains /all/ 
(keyword,hash) pairs in a finite map, and also restores them when 
when writing the file; order of lines is irrelevant.

The reason I want this to be extendable is that we can then add 
hashes for things like the pending patch, the rebase patch, etc.

The reason I am proposing this solution at all is that it paves the 
way for internal branches.

The only question is how to make this change in a compatible way. I 
think that requires that we keep maintaining the 
_darcs/hashed_inventory file. We also need a way to find out which of 
_darcs/hashed_inventory or _darcs/head is the more current one. Can 
we just compare the modification timestamps?
msg22385 (view) Author: ganesh Date: 2020-08-02.10:26:56
I think 'head' is a good idea. Is your intention to keep writing out both
head and hashed_inventory to maintain backwards compatibility? If so one
option would be to read both and let hashed_inventory override head if they
differ. It feels messy though and I would worry a bit about atomicity.
msg22426 (view) Author: bf Date: 2020-08-10.10:12:18
> I think 'head' is a good idea. Is your intention to keep writing out both
> head and hashed_inventory to maintain backwards compatibility?

I see no other option if we want to maintain compatibility with existing
darcs versions.

> If so one option would be to read both and let hashed_inventory
> override head if they differ.
If we do that then a corrupt hashed_inventory will take precedence, too,
so we wouldn't gain anything regarding the issue at hand.

> It feels messy though and I would worry a bit about atomicity.

The messiness can be handled with suitable engineering (i.e. encapsulation).

I haven't given any thought to atomicity yet, but I don't expect this to
be a serious problem. (I may be wrong about that.)

I think the correct way to handle compatibility is by adding a new
format property as an alternative to "hashed"; let me use "head" as a
stawman. It means a repo with

  hashed|head

can be read but not modified by darcs versions that don't know "head".

A better name for the format property might be "branched" or just "branch".
History
Date User Action Args
2013-04-22 21:24:31ganeshcreate
2014-03-22 23:53:39ghsetassignedto: mdiaz
nosy: + mdiaz
2015-03-27 22:34:53ghsetassignedto: mdiaz -> MaicoLeberle
nosy: + MaicoLeberle
2020-08-02 09:06:02bfsetmessages: + msg22380
2020-08-02 10:26:58ganeshsetassignedto: MaicoLeberle ->
messages: + msg22385
2020-08-10 10:12:21bfsetmessages: + msg22426