darcs

Issue 2379 only clone repositories with packs when they are up-to-date

Title only clone repositories with packs when they are up-to-date
Priority Status resolved
Milestone Resolved in 2.10.0 HEAD
Superseder Nosy List darcs-devel, gh, simon
Assigned To
Topics

Created on 2014-04-15.20:47:32 by gh, last changed 2014-05-04.20:17:26 by noreply.

Messages
msg17352 (view) Author: gh Date: 2014-04-15.20:47:30
Packs aim at making repo cloning via HTTP faster. To create packs, the
user must run "darcs optimize --http", which creates packs corresponfing
to the current state of the repository.

When packs get outdated (because of new patches), "darcs get" gets the
packs anyway, and applies the missing patches. The problem is that
outdated packs make cloning *slower* than cloning without packs, since
patch application can be costful.

So I suggest a little change of format and behaviour:

* when creating packs, copy pristine hash to _darcs/packs/pristine
* when getting, compare remote _darcs/packs/pristine to the pristine
hash of _darcs/hashed_inventory
* if _darcs/packs/pristine does not exist, or hash is different, get
normally, otherwise get with packs (function copyPackedRepository2)

Basically this makes darcs clone repository with packs only when they
are up-to-date (modulo pristine hash collision, which can happen, mostly
if the missing patches are tags).

As a bonus, this is retrocompatible with darcs 2.8, but anyway packs
were not enabled by default so I guess we can change them as we wish.

Related:

* <http://darcs.net/Internals/OptimizeHTTP>
* <http://irclog.perlgeek.de/darcs/2014-04-15#i_8592088>
msg17353 (view) Author: gh Date: 2014-04-15.20:49:02
Sorry for the lack of proof-reading here's a correction:

Outdated packs do *not* make cloning *systematically* slower, but they
can with time.
msg17354 (view) Author: kowey Date: 2014-04-16.09:00:09
Wasn't the idea behind packs supposed to be that we would fetch from
both sides and meet in the middle?


On 15 April 2014 21:49, Guillaume Hoffmann <bugs@darcs.net> wrote:
>
> Guillaume Hoffmann <guillaumh@gmail.com> added the comment:
>
> Sorry for the lack of proof-reading here's a correction:
>
> Outdated packs do *not* make cloning *systematically* slower, but they
> can with time.
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue2379>
> __________________________________
> _______________________________________________
> darcs-devel mailing list
> darcs-devel@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-devel



-- 
Eric Kow <http://erickow.com>
msg17356 (view) Author: gh Date: 2014-04-17.18:00:08
Yes that was the idea, but in the case of getting the last pristine
state, it does not work well in all cases, since outdated packs require
downloading and applying extra patches, which unfortunately is slow in
some real-world cases.

One toy case I made for the sake of the argument is this repo:
<http://www.cs.famaf.unc.edu.ar/~hoffmann/badpacks/>  It has 2 patches,
one that introduces a big binary file, and another that replaces its
contents with only a few bytes. Cloning it without packs is much faster
than with.

And I can't think of any way of predicting whether it's worth using
packs+new patches versus pristine downloading.

Now for getting the whole history... actually yes, the "meeting in the
middle" idea works, since we just want to download all patches. So in
the case of patches I think that we should use them in all cases.

That is, my proposal is now:

* when creating packs, copy pristine hash to _darcs/packs/pristine
* when getting, compare remote _darcs/packs/pristine to the pristine
hash of _darcs/hashed_inventory
* if _darcs/packs/pristine does not exist, or hash is different, get
the pristine cache normally, otherwise get it with packs (beginning of
function copyPackedRepository2)
* if _darcs/packs/patches.tar.gz exists, grab this pack and patches in
parallel (end of function copyPackedRepository2)
msg17429 (view) Author: noreply Date: 2014-05-04.20:17:25
The following patch sent by Guillaume Hoffmann <guillaumh@gmail.com> updated issue issue2379 with
status=resolved;resolvedin=2.10.0 HEAD

* resolve issue2379: only use packs to copy pristine when up-to-date 
Ignore-this: 76acb197a8a681ef92c496819b08add5

When creating packs, save pristine hash to _darcs/packs/pristine
If basic pack is outdated, do not fetch it, but fetch patches pack
anyway.
In Darcs.Repository, separate functions between the ones that fetch
basic repository and complete repository (packed or not), and
separate function that clones old-fashioned repositories.
History
Date User Action Args
2014-04-15 20:47:32ghcreate
2014-04-15 20:49:03ghsetmessages: + msg17353
2014-04-15 21:08:06ghsettitle: only clone repositories with packs when they are up-to-date -> only use with packs when up-to-date
2014-04-15 21:14:55ghsettitle: only use with packs when up-to-date -> only use packs when up-to-date
2014-04-16 09:00:11koweysetmessages: + msg17354
title: only use packs when up-to-date -> only clone repositories with packs when they are up-to-date
2014-04-16 13:35:57ghsetnosy: + darcs-devel
2014-04-16 13:36:18ghsetnosy: + simon
2014-04-17 18:00:10ghsetmessages: + msg17356
2014-05-04 20:17:26noreplysetstatus: unknown -> resolved
messages: + msg17429
resolvedin: 2.10.0 HEAD