darcs

Patch 1256 optimization: compress patch data before... (and 2 more)

Title optimization: compress patch data before... (and 2 more)
Superseder Nosy List bfrk
Related Issues
Status accepted Assigned To
Milestone 2.10.0

Created on 2015-02-04.21:18:26 by bfrk, last changed 2015-02-13.14:47:04 by bfrk.

Files
File name Status Uploaded Type Edit Remove
optimization_-compress-patch-data-before-sending-over-ssh.dpatch bfrk, 2015-02-04.21:18:26 application/x-darcs-patch
patch-preview.txt bfrk, 2015-02-04.21:18:26 text/x-darcs-patch
unnamed bfrk, 2015-02-04.21:18:26
See mailing list archives for discussion on individual patches.
Messages
msg17997 (view) Author: bfrk Date: 2015-02-04.21:18:26
3 patches for repository http://darcs.net/screened:

patch 8a1d2ccbb3deecd9466820ba8810149f55a3e6a7
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Wed Feb  4 21:38:00 CET 2015
  * optimization: compress patch data before sending over ssh
  
  We also make sure that the stdout of the sender and the stdin of the
  receiver are in block buffering mode. For large (e.g. 20MB) pushes to a
  remote repo via ssh this reduce the runtime to a less than a third of what
  it took before.
  
  We could do even better (I think) by sending the raw patch files, since the
  latter are normally already in compressed form. However, this would require
  some refactoring of the way we retrieve the data from the local repository,
  as well as some sort of negotiation between sender and receiver, so that we
  remain compatible with old versions of darcs at the remote end.
  
  The current patch is compatible back to at least darcs-2.5, since reading
  patch bundles from stdin already checks for compressed input data and calls
  gzDecompress if it sees the characteristic byte sequence.

patch 28f42ae82c0891ac2ecbd3993121a77e003829c4
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Wed Feb  4 21:39:50 CET 2015
  * cleanup: removed no longer used option compressActions

patch b0c0b4f5019fc40d550be65b5408f9625236f7c8
Author: Ben Franksen <benjamin.franksen@helmholtz-berlin.de>
Date:   Wed Feb  4 22:06:21 CET 2015
  * add --[no-]compress option to push command (with --compress the default)
  
  This is to enable interoperation with ancient remote darcs installations
  that are not capable of dealing with compressed input data. Tested with
  darcs-1.0.9rc1 (that version was packaged with debian 4.0).
Attachments
msg18022 (view) Author: bfrk Date: 2015-02-05.16:16:52
self-accept
msg18024 (view) Author: bfrk Date: 2015-02-05.16:23:06
I have set this to accepted-pending-tests (even though it is in reviewed
now) because I would really like others to try this out, especially with
older versions of the remote darcs, to see if anything untoward comes up
yet.
msg18034 (view) Author: gh Date: 2015-02-05.19:40:32
We usually wait one week before self-accepting non-trivial changes.
I'm going to test it soon.
msg18054 (view) Author: gh Date: 2015-02-09.23:57:58
Does block buffering mode improve speed?

Is it necessary for the remote darcs to be in block buffering mode? I've
tested with an older remote darcs (2.8) and pushing worked.

Is switching to block buffering mode an independent change from
compressing the data?

About the code itself:

* "optimization: compress patch data before sending over ssh": ok
* "cleanup": ok
* "add --[no-]compress option to push command": ok. +1 for using the
existing flags.
msg18060 (view) Author: bfrk Date: 2015-02-10.19:32:56
> Does block buffering mode improve speed?

I did not measure it but just assumed that block buffering must be
better in this case.

What a mistake!

I have now made a few test runs in which I explicitly set either no
buffering or block buffering in the local (Darcs.Util.External) and the
remote darcs (in Darcs.Util.ByteString.gzReadStdin) and ran all 4
combinations with both small and large pushes.

The result is that buffering seems to make no significant difference for
the sender (local darcs). On the receiver (remote darcs) no buffering
seems to give slightly better numbers; however, the difference is very
small and almost lost in the noise.

This is all completely contrary to my expectations.

> Is it necessary for the remote darcs to be in block buffering mode?

No. Both ends can independently chose buffering however they like.

> Is switching to block buffering mode an independent change from
> compressing the data?

Yes. I really should have recorded two separate patches. And I wish I
would have waited before pushing the patch to reviewed. Can we just
obliterate it (from screened and reviewed) and let me send an amended
version?
msg18062 (view) Author: gh Date: 2015-02-10.21:33:21
>> Is switching to block buffering mode an independent change from
>> compressing the data?
>
> Yes. I really should have recorded two separate patches. And I wish I
> would have waited before pushing the patch to reviewed. Can we just
> obliterate it (from screened and reviewed) and let me send an amended
> version?

Probably not, because we have a couple of mirrors out there on
hub.darcs.net that already have the patches (if you don't obliterate a
patch within 1 hour, they are pulled by the mirrors).

I think it's better if you send a followup patch removing block buffering.
msg18065 (view) Author: bfrk Date: 2015-02-10.23:44:36
Patch1268 rolls back the buffering changes.
msg18119 (view) Author: bfrk Date: 2015-02-13.14:47:04
Updating because it is now in reviewed and 2.10 branch (with the
additional patch 1268 that rolls back the buffering changes).
History
Date User Action Args
2015-02-04 21:18:26bfrkcreate
2015-02-04 21:47:19bfrksetstatus: needs-screening -> needs-review
2015-02-05 16:16:52bfrksetstatus: needs-review -> accepted-pending-tests
messages: + msg18022
milestone: 2.10.0
2015-02-05 16:23:06bfrksetmessages: + msg18024
2015-02-05 19:40:32ghsetmessages: + msg18034
2015-02-09 23:57:58ghsetmessages: + msg18054
2015-02-10 19:32:56bfrksetmessages: + msg18060
2015-02-10 21:33:21ghsetmessages: + msg18062
2015-02-10 23:44:36bfrksetmessages: + msg18065
2015-02-13 14:47:04bfrksetstatus: accepted-pending-tests -> accepted
messages: + msg18119