Let me introduce the term "version" to denote a repo state that we can
store efficiently. Each version contains the hash of its latest patch
(representation) and the hash of its parent version. This is like lists
in Haskell with shared tails. As an optimization, we can add (and
should) an optional inventory hash, see below.
Here is an example scenario. Suppose we commute the last two patches of
a repo. So the inventory has some patches (a, b) at its end, and we have
the current version vb=(b,va), va=(a,vx). We drop the last two patches
from the inventory, simultaneously going back to the parent of the
parent of the current version vb which is vx. The patches are commuted
and we get new representations (b', a') which we append, one by one, to
the inventory. For the first new patch b' we make a new version
vb'=(b',vx) and for a' we make va'=(a',vb'). Then va' is our new current
version. Suppose we now re-order (b',a') back to (a,b). We drop (b',a')
as before and start with vx. Now, when we append a, we make va''=(a,vx)
but now this has the same hash as that of va, so we share the version
file; and similar with b.
This suggests that the storage requirements are roughly two additional
hashes per patch representation. This is quite reasonable IMO.
Reconstructing a repo state from a version hash requires reading as many
version files as are patches in the head inventory. The first patch of
the head inventory contains the optional reference to the parent
inventory. When we hit such an inventory hash, we can proceed by
restoring the repo from the chain of inventories as we do now.
(BTW, an independent optimization we can and should add to Darcs is to
remember the pristine hash in /each/ inventory instead of only the head
inventory. This has the potential to speed up many operations that work
on the distant past of a repo, like cloning with --tag. I am not ready
to suggest we add the pristine hash to each version, but /if/ we did
that then we would arrive at something very similar to git. I have no
idea how git manages to optimize away the enormous storage requirements
that a naive implementation of its underlying model has. I think I will
ask Steve how they do that.)
Since we do not have branches (multiple head inventories under different
names) yet, the only reasonably safe way to restore a version is to make
a clone. We could consider adding a dangerous command to restore a
version in-place, with appropriately dire warnings attached.
|