darcs

Issue 1678 performance regression in check/whatsnew/record (2.3.x)

Title performance regression in check/whatsnew/record (2.3.x)
Priority bug Status resolved
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, mornfall, quick
Assigned To
Topics Performance, Regression

Created on 2009-11-09.00:34:21 by quick, last changed 2010-01-22.16:03:20 by kowey.

Messages
msg9216 (view) Author: quick Date: 2009-11-09.00:34:19
Darcs 2.3.1 is than 2.2.0, which was already too slow for comfort:

$ $ time /Programs/Darcs/2.2.0/bin/darcs check
The repository is consistent!                                        
                                                                     
real    0m28.305s
user    0m27.421s
sys     0m1.009s
$ time /Programs/Darcs/2.3.1/bin/darcs check
The repository is consistent!                                        
                                                                     
real    0m36.585s
user    0m34.830s
sys     0m0.861s



$ time /Programs/Darcs/2.2.0/bin/darcs w
No changes!                                                          
                                                                     
real    0m6.781s
user    0m6.341s
sys     0m0.683s
$ time /Programs/Darcs/2.3.1/bin/darcs w
No changes!

real    0m14.968s
user    0m13.949s
sys     0m0.991s


$ time /Programs/Darcs/2.2.0/bin/darcs record -m'test record' -a
No changes!                                                          
                                                                     
real    0m6.859s
user    0m6.384s
sys     0m0.714s
$ time /Programs/Darcs/2.3.1/bin/darcs record -m'test record' -a
No changes!                                                          
                                                                     
real    0m8.560s
user    0m7.550s
sys     0m0.386s

$ darcs show repo
          Type: darcs
        Format: hashed
          Root: /Mount/Work/per_dtest
      Pristine: HashedPristine
...
   Num Patches: 2913
$ du -sh _darcs/pristine.hashed/
28M     _darcs/pristine.hashed/
$
msg9222 (view) Author: kowey Date: 2009-11-09.11:43:26
Thanks for the details, Kevin.  Any more details about the repository?  For
example, how many files are in it?  What's the depth?

Also: would you be willing to report back with the HEAD darcs?

(If you're really keen, there's also the darcs-benchmark program you can cabal
install.  Maybe this would help you to compare notes with us on some shared
repositories just to make sure...)
msg9244 (view) Author: quick Date: 2009-11-14.05:22:55
4843 files + 260 dirs = 5103 elements
max depth ~= 6-7 directories deep

Timing is quite interesting now: I'm running this on the repo that I issued a 
darcs repair on using the latest dev darcs (to fix issue1677).  The repair 
seems to have made 2.2.0 a little slower but 2.3.1 much faster... especially 
after the first run:

$ time /Programs/Darcs/2.2.0/bin/darcs w
No changes!               
                          
real    0m7.720s
user    0m7.138s
sys     0m0.870s

$ time /Programs/Darcs/2.3.1/bin/darcs w
No changes!

real    0m3.656s
user    0m1.148s
sys     0m0.127s

$ time /Programs/Darcs/2.3.1/bin/darcs w
No changes!

real    0m0.152s
user    0m0.123s
sys     0m0.028s

And the latest dev darcs is even faster:

$ time /Mount/KWQ/darcs.latest/dist/build/darcs/darcs w
No changes!

real    0m1.174s
user    0m1.041s
sys     0m0.121s

$ $ time /Mount/KWQ/darcs.latest/dist/build/darcs/darcs w
No changes!

real    0m0.042s
user    0m0.022s
sys     0m0.017s

So it would seem that the darcs repair was needed to enable this performance 
boost.  The darcs 2.3.1 repair was unsuccessful and didn't result in the 
speedup, but the current dev version is rocket fuel.

I'll try to install cabal soon to run the darcs-benchmark.
msg9824 (view) Author: kowey Date: 2010-01-15.17:23:02
Hi Kevin,

Here are some recent numbers from darcs-benchmark on some canned repositories of
the darcs and the GHC source:

=== darcs ===

                ||  darcs-2.2.1        | darcs-2.3.1       |  darcs
================++=====================+===================+===============
          check ||        11.7s 183.0M |       11.0s 55.0M |  42.4s 258.0M
         repair ||        11.8s 183.0M |       11.1s 55.0M |  44.2s 272.0M


=== ghc-hashed ===

                ||  darcs-2.2.1         | darcs-2.3.1        |   darcs
================++======================+====================+===============
          check ||       172.3s 1210.0M |      160.5s 146.0M |       -      -
         repair ||       173.0s 1287.0M |      127.7s 146.0M | 1190.6s 355.0M

For details, see
 - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022832.html
 - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022833.html

Now, these are focused on check/repair (see the benchmarks for some darcs what
numbers), but I thought you should see them.

One question which puzzles me is why the almost-Darcs-2.4 you used to repair the
repo made things go faster.  Does this have something to do with Petr's hashed
storage work?  I'm guessing that there is some indirect effect here, for
example, that the repair had a side effect of doing darcs optimize --pristine,
or that it garbage collected the pristine dir...

Do you still have the unrepaired version of the repository?  Any insight into
the differences between them?
msg9832 (view) Author: mornfall Date: 2010-01-17.16:46:07
> === darcs ===
>
>                 ||  darcs-2.2.1        | darcs-2.3.1       |  darcs
> ================++=====================+===================+===============
>           check ||        11.7s 183.0M |       11.0s 55.0M |  42.4s 258.0M
>          repair ||        11.8s 183.0M |       11.1s 55.0M |  44.2s 272.0M
>
>
> === ghc-hashed ===
>
>                 ||  darcs-2.2.1         | darcs-2.3.1        |   darcs
> ================++======================+====================+===============
>           check ||       172.3s 1210.0M |      160.5s 146.0M |       -      -
>          repair ||       173.0s 1287.0M |      127.7s 146.0M | 1190.6s 355.0M
>
> For details, see
>  - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022832.html
>  - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022833.html
I have fixed the worst part of this problem in hashed-storage. Currently
only available in its darcs version.

On my machine: darcs-2.3 = 1:53, darcs-HEAD = 2:49 (on ghc-hashed).

That's still a substantial difference, but much less than 2:06 against
almost 20 minutes. I don't think I will have the time to bring this
further down in time for darcs 2.4, so we'll have to accept this as a
price for the refactoring. (The primary cause for this was my crusade to
remove SlurpDirectory which is not yet complete, but it is reasonably
close to completion, so maybe in 2.5 we can get rid of that...)

Yours,
   Petr.
msg9863 (view) Author: quick Date: 2010-01-19.14:47:37
Tried to update the bug with numbers, but got this:

MOD_PYTHON ERROR

ProcessId:      3595
Interpreter:    'bugs.darcs.net'

ServerName:     'bugs.darcs.net'
DocumentRoot:   '/var/lib/roundup/trackers/darcs'

URI:            '/issue1678'
Location:       None
Directory:      '/var/lib/roundup/trackers/'
Filename:       '/var/lib/roundup/trackers/darcs/dummy.py'
PathInfo:       '/issue1678'

Phase:          'PythonHandler'
Handler:        'roundup.cgi.apache'

Traceback (most recent call last):

  File "/usr/lib/python2.5/site-packages/mod_python/importer.py", line 1537, in HandlerDispatch
    default=default_handler, arg=req, silent=hlist.silent)

  File "/usr/lib/python2.5/site-packages/mod_python/importer.py", line 1229, in _process_target
    result = _execute_target(config, req, object, arg)

  File "/usr/lib/python2.5/site-packages/mod_python/importer.py", line 1128, in _execute_target
    result = object(arg)

  File "/usr/lib/python2.5/site-packages/roundup/cgi/apache.py", line 95, in handler
    _tracker = roundup.instance.open(_home, not _debug)

  File "/usr/lib/python2.5/site-packages/roundup/instance.py", line 257, in open

  File "/usr/lib/python2.5/site-packages/roundup/instance.py", line 54, in __init__

  File "/usr/lib/python2.5/site-packages/roundup/instance.py", line 79, in get_backend_name

IOError: [Errno 24] Too many open files: '/var/lib/roundup/trackers/darcs/db/backend_name'


Sending them via this email instead.  Here are the numbers for both
the unrepaired and the repaired repo.  Note that only the 2.2.0
numbers are valid for the unfixed repo are valid: 2.3.1 and the
current darcs both abort with the zip file error.


=== dtest ===

                || darcs2.2.0        | darcs2.3.1        | darcs.net       
================++===================+===================+=================
     get (full) ||      30.6s 154.0M |       3.9s 159.0M |     29.4s 159.0M
get (lazy, x10) ||     107.2s   7.0M |       3.1s   9.0M |    118.2s   8.0M
       pull 100 ||      16.4s  23.0M |       8.8s  29.0M |         -      -
       annotate ||          -      - |          -      - |         -      -
         wh x50 ||     290.4s   0.0M |       0.3s   0.0M |     38.4s   0.0M
     wh mod x50 ||     290.5s  10.0M |       8.4s   0.0M |     38.4s   0.0M
      wh -l x20 ||     368.2s  83.0M |       5.1s  74.0M |     16.9s   0.0M
          check ||      27.4s 274.0M |       5.0s 275.0M |         -      -
         repair ||      27.2s 272.0M |       4.8s 275.0M |     77.4s 392.0M
      pull 1000 ||      49.1s  85.0M |       0.4s  84.0M |     39.5s  84.0M

=== drepair ===

                || darcs2.2.0        | darcs2.3.1        | darcs.net       
================++===================+===================+=================
     get (full) ||      29.6s 153.0M |      36.3s 160.0M |     32.8s 159.0M
get (lazy, x10) ||     114.4s   5.0M |     114.3s   8.0M |    118.7s   7.0M
       pull 100 ||      18.3s  28.0M |      18.0s  25.0M |      7.2s  29.0M
       annotate ||          -      - |          -      - |         -      -
         wh x50 ||     317.5s   0.0M |       7.8s   0.0M |      9.8s   0.0M
     wh mod x50 ||     316.7s  11.0M |       8.0s   6.0M |     10.1s   4.0M
      wh -l x20 ||     412.3s 145.0M |     403.0s 128.0M |     12.3s   5.0M
          check ||      27.6s 224.0M |      34.7s 275.0M |     76.7s 410.0M
         repair ||      27.5s 275.0M |      34.1s 274.0M |     78.0s 392.0M
      pull 1000 ||      48.7s  84.0M |      49.3s  86.0M |     39.6s  84.0M


          Type: darcs
        Format: hashed
      Pristine: HashedPristine
   Num Patches: 2913
 Max dir depth: 5
         Files: 4842


-KQ


On Sun, Jan 17, 2010 at 04:46:12PM +0000, Petr Ročkai wrote:
> 
> Petr Ročkai <me@mornfall.net> added the comment:
> 
> > === darcs ===
> >
> >                 ||  darcs-2.2.1        | darcs-2.3.1       |  darcs
> > ================++=====================+===================+===============
> >           check ||        11.7s 183.0M |       11.0s 55.0M |  42.4s 258.0M
> >          repair ||        11.8s 183.0M |       11.1s 55.0M |  44.2s 272.0M
> >
> >
> > === ghc-hashed ===
> >
> >                 ||  darcs-2.2.1         | darcs-2.3.1        |   darcs
> > ================++======================+====================+===============
> >           check ||       172.3s 1210.0M |      160.5s 146.0M |       -      -
> >          repair ||       173.0s 1287.0M |      127.7s 146.0M | 1190.6s 355.0M
> >
> > For details, see
> >  - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022832.html
> >  - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022833.html
> I have fixed the worst part of this problem in hashed-storage. Currently
> only available in its darcs version.
> 
> On my machine: darcs-2.3 = 1:53, darcs-HEAD = 2:49 (on ghc-hashed).
> 
> That's still a substantial difference, but much less than 2:06 against
> almost 20 minutes. I don't think I will have the time to bring this
> further down in time for darcs 2.4, so we'll have to accept this as a
> price for the refactoring. (The primary cause for this was my crusade to
> remove SlurpDirectory which is not yet complete, but it is reasonably
> close to completion, so maybe in 2.5 we can get rid of that...)
> 
> Yours,
>    Petr.
> 
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue1678>
> __________________________________
msg9894 (view) Author: kowey Date: 2010-01-22.16:03:15
What's that zip file error again?

Anyway, at the risk of cherry-picking the good news, this looks encouraging

    wh -l x20 ||     412.3s 145.0M |     403.0s 128.0M |     12.3s   5.0M

Anyway, I'm going to mark this as resolved.  Please do shout if you
think I'm making a mistake :-)

Ideally, we would have benchmarks for darcs record as well (but I
haven't really worked out a good way to do this; maybe randomly select
10 files, make chages and record?)
History
Date User Action Args
2009-11-09 00:34:21quickcreate
2009-11-09 11:43:32koweysetstatus: unknown -> waiting-for
title: darcs 2.3.1 performance degradation -> performance regression in check/whatsnew/record (2.3.x)
nosy: + kowey, mornfall
messages: + msg9222
topic: + Performance, Regression
assignedto: quick
2009-11-14 05:22:57quicksetmessages: + msg9244
2010-01-14 14:59:05koweysetpriority: wishlist -> bug
2010-01-15 17:23:06koweysetmessages: + msg9824
2010-01-17 16:46:12mornfallsetmessages: + msg9832
2010-01-19 14:47:41quicksetmessages: + msg9863
2010-01-22 16:03:20koweysetstatus: waiting-for -> resolved
assignedto: quick ->
messages: + msg9894