darcs

Issue 1589 darcs unpull --last=1 takes 7 seconds

Title darcs unpull --last=1 takes 7 seconds
Priority bug Status resolved
Milestone Resolved in
Superseder make darcs record, etc. O(1) in time (is now O(n) wrt num patches)
View: 1106
Nosy List darcs-devel, dmitry.kurochkin, kowey, mornfall, simonmar
Assigned To
Topics Performance

Created on 2009-08-30.12:10:52 by simonmar, last changed 2010-05-08.23:19:27 by kowey.

Files
File name Uploaded Type Edit Remove
darcs-unpull-1.prof.gz simonmar, 2009-08-30.12:10:48 application/x-gzip
Messages
msg8593 (view) Author: simonmar Date: 2009-08-30.12:10:48
With a hashed GHC repository, "darcs unpull --last=1 -a" takes over 7 seconds
with a recent darcs (tested 2.2, 2.3, and darcs-hs).

Profile attached.  The problem seems to be that darcs is optimising the
inventory, although the inventory is already optimised and nothing needs to be
done.  It is apparently writing out about 30% of my inventory files (going by
the modification times on the inventory files).

To reproduce: get a hashed GHC repository (http://darcs.haskell.org/ghc), and
run "darcs unpull --last=1 -a".
Attachments
msg8616 (view) Author: kowey Date: 2009-08-31.09:06:11
(Simon: by the way, is it faster with the old-fashioned repo?)

Marking need-action as I think there are two specific questions that we can
begin by answering:

1. Where do the two optimizes come from in the code?
2. Why are there two of them?

I believe the answer to #1 is that it comes from an explicit call somewhere in
unpull and maybe an implicit one introduced to fix issue873 or issue433.  I'm
not sure if these are the right tickets or if there even are tickets.  Try darcs
changes -p optimize.

As for #2, I believe it's just historical accident.  One was specific and
introduced first; the second was generic and introduced next, but we never got
rid of the specific one.  See, for example:

Wed Mar 19 16:02:23 GMT 2008  David Roundy <droundy@darcs.net>
  * we don't need extra call to optimizeInventory in Tag.

So do we have another instance of this? Or this state of affairs necessary for
any particular reason?
msg8781 (view) Author: simonmar Date: 2009-09-10.08:43:40
Re: (Simon: by the way, is it faster with the old-fashioned repo?)

Yes.  I don't have a way to do a rigorous comparison just now, but I tried on a
darcs-1 repo at work and it took 5.5s to unpull --last=1, and 2s to pull the
patch again.  Those are NFS-mounted repos, with darcs 2.2.1.
msg8782 (view) Author: igloo Date: 2009-09-10.10:43:18
FWIW, on an ext3 FS, with an old-fashioned GHC HEAD repo and darcs 1.0.9rc1:
$ echo y | /usr/bin/time darcs unpull --last=1
Wed Sep  9 12:57:57 BST 2009  Simon Marlow <marlowsd@gmail.com>
  * Fix up logic to decide whether shared libs are being built or not
  Ignore-this: 7709d980f416318eae71e8bae255fa2d
  
  Now, adding dyn to $(GhcLibWays) is how shared libs are enabled.
  Everything else keys off that, rather than testing
  $(BuildSharedLibs).
Shall I unpull this patch? (1/1)  [ynWvpxqadjk], or ? for help: 
Finished unpulling.
1.00user 0.10system 0:01.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8408outputs (0major+11317minor)pagefaults 0swaps

I believe the 1 second is due to a call to wait_a_moment (== sleep 1), but I
haven't followed the code through to be sure of that.
msg8783 (view) Author: kowey Date: 2009-09-10.11:06:07
Ian: it looks like darcs-1.0.9 is way faster, but just to be sure could you
please repeat that with darcs 2.3 (old-fashioned) and also with hashed?

I'd like to see all three numbers using the same setup
msg8809 (view) Author: mornfall Date: 2009-09-14.12:43:56
(Just a sidenote, the time says it's 1 second of *user* time, not elapsed, and
CPU usage of 99% -- that either means the wait_a_moment hypothesis is wrong or
that sleep in GHC is brutally inefficient.)
msg8839 (view) Author: mornfall Date: 2009-09-17.20:37:27
Timing on my laptop:
hashed -- 2.9s, ~ 50% tentativelyRemovePatches, 50% finalizeRepositoryChanges
plain -- 0.98s, no profile

here, the difference is not so dramatic, but it's still rather big. It probably
depends how big the patch is (how many files it affects) since the difference
is probably proportional to this. About 24% of the time is spent (indirectly)
in writeHashFile. Also, darcs reads a lot of inventories, which is part of the
problem -- I think David worked on this in his branch, but I'm not sure how to
isolate these changes and whether they help at all... It's the same story as
SlurpDirectory -- lots of ad-hoc IO interleaving that behaves rather
unpredictably. So basically there are two fronts of attack here: make general
filesystem access more efficient, and fix the inventory problem (which may be
pretty hard though).
msg9885 (view) Author: simonmar Date: 2010-01-21.14:35:47
On 17/09/2009 21:37, Petr Ročkai wrote:
>
> Petr Ročkai<me@mornfall.net>  added the comment:
>
> Timing on my laptop:
> hashed -- 2.9s, ~ 50% tentativelyRemovePatches, 50% finalizeRepositoryChanges
> plain -- 0.98s, no profile
>
> here, the difference is not so dramatic, but it's still rather big. It probably
> depends how big the patch is (how many files it affects) since the difference
> is probably proportional to this. About 24% of the time is spent (indirectly)
> in writeHashFile. Also, darcs reads a lot of inventories, which is part of the
> problem -- I think David worked on this in his branch, but I'm not sure how to
> isolate these changes and whether they help at all... It's the same story as
> SlurpDirectory -- lots of ad-hoc IO interleaving that behaves rather
> unpredictably. So basically there are two fronts of attack here: make general
> filesystem access more efficient, and fix the inventory problem (which may be
> pretty hard though).

I expect the main underlying problem here is issue1106.

Cheers,
	Simon
msg9886 (view) Author: mornfall Date: 2010-01-21.14:55:32
Hi,

Simon Marlow <marlowsd@gmail.com> writes:
> I expect the main underlying problem here is issue1106.
yes, agreed -- I have since writing that comment unravelled the
inventory code and it's clearly O(n) in number of patches to update
inventories.

I have darcs patched locally to print every inventory write so that it
annoys me to no end and I actually get around to fixing the problem.

;-)

Yours,
   Petr.

PS: An example...

15:53:35 | morn@twi:~/dev/divine/branch-2.0 -> darcs unpull
Thu Jan 21 15:02:15 CET 2010  Petr Rockai <me@mornfall.net>
  * Remove unused code from the DVE compiler.
Shall I unpull this patch? (1/14)  [ynWvplxdaqjk], or ? for help: y
Thu Jan 21 14:56:34 CET 2010  Petr Rockai <me@mornfall.net>
  * Lift common code out of branching in the DVE compiler output.
Shall I unpull this patch? (2/14)  [ynWsfvplxdaqjk], or ? for help: d
write_inventory_private: resthash = Nothing
write_inventory_private: resthash = Just "0000156473-53cd0e544c44c8b55beb69d99486a870f6dbf15188176d117c3e62e1fd20cfee"
write_inventory_private: resthash = Just "0000039236-65602e019537793353a4ded458ad7864abff8cedcdb532490aa301835f5cc465"
write_inventory_private: resthash = Just "0000005464-0c688891864dc674f97d223b17282ef1df128921427f88855c43d7abb70852f3"
write_inventory_private: resthash = Just "0000005796-a6a121cc0d0914194033bb56bc9dfb268f7b89e51a3b5b46ec3557d1a79817a8"
write_inventory_private: resthash = Just "0000010812-d141f5c6795154eb020816fe3908795c31cb7fd17dbc4198946bc624fe0636ba"
write_either_inventory at Just "0000003104-b1b0a26c06978386ff12907de000eb213be3c3b905faa7cef019e515542b1086"
write_inventory_private: resthash = Nothing
write_inventory_private: resthash = Just "0000156473-53cd0e544c44c8b55beb69d99486a870f6dbf15188176d117c3e62e1fd20cfee"
write_inventory_private: resthash = Just "0000039236-65602e019537793353a4ded458ad7864abff8cedcdb532490aa301835f5cc465"
write_inventory_private: resthash = Just "0000005464-0c688891864dc674f97d223b17282ef1df128921427f88855c43d7abb70852f3"
write_inventory_private: resthash = Just "0000005796-a6a121cc0d0914194033bb56bc9dfb268f7b89e51a3b5b46ec3557d1a79817a8"
write_inventory_private: resthash = Just "0000010812-d141f5c6795154eb020816fe3908795c31cb7fd17dbc4198946bc624fe0636ba"
write_either_inventory at Just "0000003104-b1b0a26c06978386ff12907de000eb213be3c3b905faa7cef019e515542b1086"

(this also reveals that we are, probably quite needlessly, going through
this chore twice... that may also be much easier to fix than the
O(n)-ness problem. On pull, this only happens once.)
msg10039 (view) Author: kowey Date: 2010-02-19.16:47:37
This ticket is waiting for issue1100 to be resolved.  When that has
happened, we should verify that no longer happens and close the ticket.
msg11027 (view) Author: mornfall Date: 2010-05-08.15:52:03
darcs unpull --last 1 -a  0,13s user 0,03s system 95% cpu 0,168 total

... on hashed GHC repo. I think we can consider this fixed.
msg11028 (view) Author: simonmar Date: 2010-05-08.20:57:41
On 08/05/10 16:52, Petr Ročkai wrote:
>
> Petr Ročkai<me@mornfall.net>  added the comment:
>
> darcs unpull --last 1 -a  0,13s user 0,03s system 95% cpu 0,168 total
>
> ... on hashed GHC repo. I think we can consider this fixed.

Fantastic news, thanks Petr!

Cheers,
	Simon
History
Date User Action Args
2009-08-30 12:10:52simonmarcreate
2009-08-30 12:15:13koweysetpriority: bug
nosy: kowey, darcs-devel, simonmar, dmitry.kurochkin
topic: + Performance
2009-08-31 09:06:15koweysetstatus: unknown -> needs-reproduction
nosy: kowey, darcs-devel, simonmar, dmitry.kurochkin
messages: + msg8616
2009-09-10 08:43:42simonmarsetnosy: kowey, darcs-devel, simonmar, dmitry.kurochkin
messages: + msg8781
2009-09-10 10:43:20igloosetnosy: + igloo
messages: + msg8782
2009-09-10 10:46:59igloosetnosy: - igloo
2009-09-10 11:06:11koweysetstatus: needs-reproduction -> waiting-for
nosy: + igloo
messages: + msg8783
assignedto: igloo
2009-09-10 11:09:18koweysetnosy: - igloo
2009-09-14 10:36:44koweysetpriority: bug -> urgent
nosy: kowey, darcs-devel, simonmar, dmitry.kurochkin
topic: + Target-2.4
2009-09-14 12:43:58mornfallsetnosy: + mornfall
messages: + msg8809
2009-09-17 20:37:31mornfallsetnosy: kowey, darcs-devel, simonmar, dmitry.kurochkin, mornfall
messages: + msg8839
2009-10-23 22:36:46adminsetnosy: + marlowsd, - simonmar
2009-10-23 23:35:16adminsetnosy: + simonmar, - marlowsd
2009-11-15 16:14:30tux_rockersettopic: - Target-2.4
2010-01-21 14:35:52simonmarsetmessages: + msg9885
2010-01-21 14:55:37mornfallsetmessages: + msg9886
2010-02-19 16:47:39koweysetpriority: urgent -> bug
messages: + msg10039
superseder: + make darcs record, etc. O(1) in time (is now O(n) wrt num patches)
assignedto: igloo ->
2010-05-08 15:52:04mornfallsetstatus: waiting-for -> resolved
messages: + msg11027
2010-05-08 20:57:42simonmarsetstatus: resolved -> unknown
messages: + msg11028
2010-05-08 23:19:27koweysetstatus: unknown -> resolved