darcs

Issue 1731 performance regression in check/repair (2.4.x)

Title performance regression in check/repair (2.4.x)
Priority bug Status resolved
Milestone Resolved in
Superseder Nosy List dagit, darcs-devel, dmitry.kurochkin, kowey, mornfall, quick, tux_rocker
Assigned To mornfall
Topics Performance, Regression

Created on 2010-01-15.17:24:49 by kowey, last changed 2010-03-15.11:00:55 by tux_rocker.

Files
File name Uploaded Type Edit Remove
darcs-2.4-beta-prof.prof dagit, 2010-01-16.19:08:19 application/octet-stream
include-the-exact-package-versions-darcs-was-compiled-with-in-__exact_version_.dpatch kowey, 2010-02-02.14:38:07 text/plain
unnamed dagit, 2010-01-16.23:46:49 text/html
unnamed dagit, 2010-01-22.16:09:07 text/html
unnamed dagit, 2010-01-22.21:24:24 text/html
unnamed dagit, 2010-02-01.21:00:58 text/html
unnamed dagit, 2010-02-02.14:44:06 text/html
unnamed dagit, 2010-02-03.02:02:01 text/html
Messages
msg9825 (view) Author: kowey Date: 2010-01-15.17:24:44
Nothing super-urgent, but it'd be nice to figure out how this happened, and how
we can fix it :-)

Notice the two steps forward and half-a-step back we took...

http://lists.osuosl.org/pipermail/darcs-users/2010-January/022832.html

=== darcs ===

                ||  darcs-2.2.1        | darcs-2.3.1       |  darcs
================++=====================+===================+===============
          check ||        11.7s 183.0M |       11.0s 55.0M |  42.4s 258.0M
         repair ||        11.8s 183.0M |       11.1s 55.0M |  44.2s 272.0M


=== ghc-hashed ===

                ||  darcs-2.2.1         | darcs-2.3.1        |   darcs
================++======================+====================+===============
          check ||       172.3s 1210.0M |      160.5s 146.0M |       -      -
         repair ||       173.0s 1287.0M |      127.7s 146.0M | 1190.6s 355.0M
msg9826 (view) Author: kowey Date: 2010-01-15.17:26:34
... also, as Ian points out, an interesting clue may be the fact that now repair
finishes but not check.

Why would that be?  Is it because of the files in pristine, that check has to
hold on to them to see if they match, and that repair can just ignore them?
msg9829 (view) Author: dagit Date: 2010-01-16.19:08:19
For the failed check, when I run that case manually I get this output:
The repository is consistent!                                                   
Hash mismatch(es)!                                                              
bindisttest/a/b
    index: 0000000000000000000000000000000000000000000000000000000000000000
  working: 5847c8b50a3191775d91cbaf8029fd1a75725517b0c70dde5bff759fc67ec1f8
bindisttest/a
    index: 0000000000000000000000000000000000000000000000000000000000000000
  working: e9f80a49790bc926a3de5542274ae979753a3aedaef1b9fab0fcb74a21bc7c60
bindisttest
    index: 3c99d780272456ac9e4ab10056ee3e145700260cefee61921bf89017a15a6535
  working: 5a72b14295811905378a648e98da91922bbc69f21c1a209357834fc2e92da3f8

Bad index.


The repository was fine, but the index has issues.

As for numbers related to the performance, I ran the above test with profiling enabled and collected these numbers:
        Sat Jan 16 01:52 2010 Time and Allocation Profiling Report  (Final)

           darcs-2.4-beta-prof +RTS -prof -RTS check

        total time  =      859.54 secs   (42977 ticks @ 20 ms)
        total alloc = 58,360,459,684 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

sha256                         Bundled.SHA256        24.4    0.1
isGZFile                       ByteStringUtils       20.2    0.1
fsCreateHashedFile             Storage.Hashed.Darcs  12.3    0.2
hashedTreeIO                   Storage.Hashed.Darcs   9.8    4.7
break_before_nth_newline       ByteStringUtils        8.5    5.7
floatPath                      Storage.Hashed.AnchoredPath   4.4   16.8
removeFileMayNotExist          Darcs.Lock             3.6    0.0
base16                         Storage.Hashed.Hash    3.1   17.0
markChanged                    Storage.Hashed.Monad   1.1    2.0
modifyTree                     Storage.Hashed.Tree    0.8    2.6
applyHunkLines                 Darcs.Patch.Apply      0.7   20.9
break_after_nth_newline        ByteStringUtils        0.6    3.5
clean_hashdir                  Darcs.Repository.HashedIO   0.4    2.0
unlinesPS                      ByteStringUtils        0.3    1.9
darcsFormatDir                 Storage.Hashed.Darcs   0.3    1.5
ps2fn                          Darcs.Patch.FileName   0.3    2.4
sha256                         Storage.Hashed.Hash    0.2    1.5
breakup                        Darcs.Patch.FileName   0.2    1.1
breakFirstPS                   ByteStringUtils        0.2    1.2
firstspace                     ByteStringUtils        0.1    1.6

The expensive sha256 appears to be coming from here:
          mWriteFilePS_a4ntC Darcs.IO                                            1624      366704   0.1    0.0    28.8   14.5
           writeFile_aB9u Storage.Hashed.Monad                                1627     1466813   0.2    0.3    26.6    6.2
            maybeFlush   Storage.Hashed.Monad                                1648      733406   0.2    0.3     0.2    0.3
             flush       Storage.Hashed.Monad                                2079         226   0.0    0.0     0.0    0.0
            sha256       Storage.Hashed.Hash                                 1645      366703   0.0    0.0    23.9    0.1
             sha256      Bundled.SHA256                                      1646      733406  23.9    0.1    23.9    0.1

If I'm reading the profile correctly, that is called from here:
         applyAndFix     Darcs.Repository.Repair                             1392      105474   0.2    0.4    42.7   57.4

Which is in turn called from here:
        replayRepository' Darcs.Repository.Repair                             1236          28   0.0    0.0    95.1   96.5

Called from here:
       replayRepository  Darcs.Repository.Repair                             1235           2   0.0    0.0    99.3   98.9

And I think the entry point starts here:
 main                    Main                                                 936           5   0.0    0.0    99.8   99.3
  assertions             Main                                                3030           1   0.0    0.0     0.0    0.0
  run_the_command        Darcs.RunCommand                                     942           1   0.0    0.0    99.8   99.3
   run_command           Darcs.RunCommand                                    1004           1   0.0    0.0    99.8   99.3
    consider_running     Darcs.RunCommand                                    1006           3   0.0    0.0    99.8   99.3
     doCRCWarnings       Darcs.Commands.GZCRCs                               3032           1   0.0    0.0     0.0    0.0
      resetCRCWarnings   Darcs.Global                                        3035           1   0.0    0.0     0.0    0.0
      getCRCWarnings     Darcs.Global                                        3033           1   0.0    0.0     0.0    0.0
     checkCmd            Darcs.Commands.Check                                1153           1   0.0    0.0    99.8   99.3
      check'             Darcs.Commands.Check                                1234          13   0.0    0.0    99.8   99.3


Compiling those lists, snipping out the unrelated bits, and doctoring the whitespace:
                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc
 main                    Main                                                 936           5   0.0    0.0    99.8   99.3
  run_the_command        Darcs.RunCommand                                     942           1   0.0    0.0    99.8   99.3
   run_command           Darcs.RunCommand                                    1004           1   0.0    0.0    99.8   99.3
    consider_running     Darcs.RunCommand                                    1006           3   0.0    0.0    99.8   99.3
     checkCmd            Darcs.Commands.Check                                1153           1   0.0    0.0    99.8   99.3
      check'             Darcs.Commands.Check                                1234          13   0.0    0.0    99.8   99.3
       replayRepository  Darcs.Repository.Repair                             1235           2   0.0    0.0    99.3   98.9
        replayRepository' Darcs.Repository.Repair                            1236          28   0.0    0.0    95.1   96.5
         applyAndFix     Darcs.Repository.Repair                             1392      105474   0.2    0.4    42.7   57.4
          mWriteFilePS_a4ntC Darcs.IO                                        1624      366704   0.1    0.0    28.8   14.5
           writeFile_aB9u Storage.Hashed.Monad                               1627     1466813   0.2    0.3    26.6    6.2
            sha256       Storage.Hashed.Hash                                 1645      366703   0.0    0.0    23.9    0.1
             sha256      Bundled.SHA256                                      1646      733406  23.9    0.1    23.9    0.1

It seems like sha256 is incriminated simply because it's called so many times.  Which means it's probably worthwhile to look elsewhere for 
performance gains.

Another interesting bit, is that everything I snipped out between replayRepository' and applyAndFix contributes 0.0% to the the individual and 
inherited times, yet when we get to applyAndFix only half the time is spent at or below applyAndFix.  I think that means the rest of the time 
is spent mainly in isGZFile, fsCreateHashedFile, and hashedTreeIO (glancing at the profiling output also makes me think this).

I have no idea how to optimize this, but hopefully these numbers are insightful to someone else.  I'm also attaching the full profiling output 
in case someone wants to see it in its entirety.

Jason
Attachments
msg9830 (view) Author: tux_rocker Date: 2010-01-16.21:04:20
Op zaterdag 16 januari 2010 20:08 schreef Jason Dagit:
> Jason Dagit <dagitj@gmail.com> added the comment:

>For the failed check, when I run that case manually I get this output:

Hmmm, does it succeed when you run it manually, but fail when you run it from 
darcs-benchmark? Why?

> Another interesting bit, is that everything I snipped out between 
replayRepository' and applyAndFix contributes 0.0% to the the individual and 
> inherited times, yet when we get to applyAndFix only half the time is spent 
at or below applyAndFix.  I think that means the rest of the time 
> is spent mainly in isGZFile, fsCreateHashedFile, and hashedTreeIO (glancing 
at the profiling output also makes me think this).

isGZFile is defined in ByteStringUtils, so I'd expect it to show up in the 
profile. I can't find any occurrences of the string "fsCreateHashedFile" in 
the darcs source. hashedTreeIO is from hashed-storage, so may indeed eat a lot 
of time without showing up in the profile.

It may help to build without optimization and profile the unoptimized 
executable. While such a profile reflects less accurately what happens in an 
optimized executable, it may contain entries for some functions that do not 
appear in the profile of an optimized build due to inlining. It could be that 
isGZFile has been optimized away in this manner.
msg9831 (view) Author: dagit Date: 2010-01-16.23:46:49
On Sat, Jan 16, 2010 at 1:04 PM, Reinier Lamers <bugs@darcs.net> wrote:

>
> Reinier Lamers <tux_rocker@reinier.de> added the comment:
>
> Op zaterdag 16 januari 2010 20:08 schreef Jason Dagit:
> > Jason Dagit <dagitj@gmail.com> added the comment:
>
> >For the failed check, when I run that case manually I get this output:
>
> Hmmm, does it succeed when you run it manually, but fail when you run it
> from
> darcs-benchmark? Why?
>

I suspect that darcs-benchmark looks at the code returned by darcs which is
probably a failure exit code.  Some one should confirm this.


>
> > Another interesting bit, is that everything I snipped out between
> replayRepository' and applyAndFix contributes 0.0% to the the individual
> and
> > inherited times, yet when we get to applyAndFix only half the time is
> spent
> at or below applyAndFix.  I think that means the rest of the time
> > is spent mainly in isGZFile, fsCreateHashedFile, and hashedTreeIO
> (glancing
> at the profiling output also makes me think this).
>
> isGZFile is defined in ByteStringUtils, so I'd expect it to show up in the
> profile. I can't find any occurrences of the string "fsCreateHashedFile" in
> the darcs source. hashedTreeIO is from hashed-storage, so may indeed eat a
> lot
> of time without showing up in the profile.
>

fsCreateHashedFile shouldn't appear in the darcs source.  It's defined
in Storage.Hashed.Darcs,
which is in the hashed-storage package.


>
> It may help to build without optimization and profile the unoptimized
> executable. While such a profile reflects less accurately what happens in
> an
> optimized executable, it may contain entries for some functions that do not
> appear in the profile of an optimized build due to inlining. It could be
> that
> isGZFile has been optimized away in this manner.
>

Well....it was a bit of a mistake on my part to profile check.  Check, when
it succeeds, isn't that much different than before.  Repair is what I should
have profiled and reported on.  I knew that when I started the test the
night before but forgot it by morning.  Maybe later I can get those numbers.

Anyway, I don't think profiling without optimizations is going to be useful.

Jason
Attachments
msg9859 (view) Author: mornfall Date: 2010-01-19.02:21:42
This is now a lot better. I'll leave it to others whether this bug is resolved 
though -- there's probably still room for improvement (although much harder to 
achieve, I believe).
msg9890 (view) Author: kowey Date: 2010-01-22.11:56:17
Jason, sorry to nag you, but could you post some new numbers please? 
Maybe it's good to have darcs-2.3.1 darcs-2.4b2 darcs

where darcs is darcs-2.4b2 plus 'resolve issue1731: bump hashed-storage
dependency to 0.4.5'?

Thanks :-)
msg9895 (view) Author: dagit Date: 2010-01-22.16:09:07
On Fri, Jan 22, 2010 at 3:56 AM, Eric Kow <bugs@darcs.net> wrote:

>
> Eric Kow <kowey@darcs.net> added the comment:
>
> Jason, sorry to nag you, but could you post some new numbers please?
> Maybe it's good to have darcs-2.3.1 darcs-2.4b2 darcs
>
> where darcs is darcs-2.4b2 plus 'resolve issue1731: bump hashed-storage
> dependency to 0.4.5'?
>

What am I testing?

Jason
Attachments
msg9896 (view) Author: tux_rocker Date: 2010-01-22.20:13:07
Op vrijdag 22 januari 2010 12:56 schreef je:
> Eric Kow <kowey@darcs.net> added the comment:
> Jason, sorry to nag you, but could you post some new numbers please? 
> Maybe it's good to have darcs-2.3.1 darcs-2.4b2 darcs
> 
> where darcs is darcs-2.4b2 plus 'resolve issue1731: bump hashed-storage
> dependency to 0.4.5'?

darcs-2.4b2 includes 'resolve issue1731: bump hashed-storage dependency to 
0.4.5'. So that'd be "darcs-2.3.1 darcs-2.4b1 darcs-2.4b2".
msg9897 (view) Author: dagit Date: 2010-01-22.21:24:24
On Fri, Jan 22, 2010 at 12:13 PM, Reinier Lamers <bugs@darcs.net> wrote:

>
> Reinier Lamers <tux_rocker@reinier.de> added the comment:
>
> Op vrijdag 22 januari 2010 12:56 schreef je:
> > Eric Kow <kowey@darcs.net> added the comment:
> > Jason, sorry to nag you, but could you post some new numbers please?
> > Maybe it's good to have darcs-2.3.1 darcs-2.4b2 darcs
> >
> > where darcs is darcs-2.4b2 plus 'resolve issue1731: bump hashed-storage
> > dependency to 0.4.5'?
>
> darcs-2.4b2 includes 'resolve issue1731: bump hashed-storage dependency to
> 0.4.5'. So that'd be "darcs-2.3.1 darcs-2.4b1 darcs-2.4b2".
>

I guess what I still don't understand is how to get those versions of
darcs.  Are they tagged somewhere?  Could you please help me by giving me
simple instructions?  This will also help reduced confusion about what I did
versus what you expected me to do.

Thanks,
Jason
Attachments
msg9898 (view) Author: kowey Date: 2010-01-22.23:11:51
On Fri, Jan 22, 2010 at 13:24:15 -0800, Jason Dagit wrote:
> I guess what I still don't understand is how to get those versions of
> darcs.  Are they tagged somewhere?  Could you please help me by giving me
> simple instructions?  This will also help reduced confusion about what I did
> versus what you expected me to do.

Here's the branch:
  http://darcs.net/releases/branch-2.4

Could you build two separate versions of Darcs, one based on the tag
2.3.98.1, and one based on 2.3.98.2?  Note that you want to make sure
that the former (2.3.98.1 -- that's Darcs 2.4b1) is based on
hashed-storage 0.4.4.  The latter should naturally be based on
hashed-storage 0.4.5.

Is that enough detail?

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9903 (view) Author: tux_rocker Date: 2010-01-26.16:24:22
The following patch updated the status of issue1731 to be resolved:

* resolve issue1731: bump hashed-storage dependency to 0.4.5 
Ignore-this: c6e300b3c56d85b425306530b4d6c7be
msg9913 (view) Author: tux_rocker Date: 2010-02-01.12:26:47
Apparently, it's not as gone as we want it to be.

Jason posted this on the mailing list a couple of days a go, in the long
performance testing thread that many people apparently ignore:

                   || darcs-2.2.0         | darcs-2.2.1         |
darcs-2.3.1        | darcs-2.3.98.1        | darcs-2.3.98.2       
===================++=====================+=====================+====================+=======================+======================
             check ||      178.5s 1201.0M |      180.9s 1372.0M |     
153.4s 145.0M |              -      - |              -      -
            repair ||      174.0s 1204.0M |      178.2s 1207.0M |     
125.8s 146.0M |        1180.7s 354.0M |        1082.7s 355.0M

So even with hashed-storage 0.4.5, we're still horribly slow on
repairing the ghc repo! Petr, can you investigate?
msg9915 (view) Author: mornfall Date: 2010-02-01.19:47:48
Sorry, can't reproduce. Please double-check your setup, provide --exact-
version, &c. Thanks.

20:39:24 | morn@twi:~/dev/darcs/_test/hashed_ghc -> time darcs check 
The repository is consistent!                                                   
darcs check  179,75s user 8,12s system 79% cpu 3:57,14 total                    
20:43:27 | morn@twi:~/dev/darcs/_test/hashed_ghc -> time darcs check
The repository is consistent!                                                   
darcs check  161,02s user 5,37s system 98% cpu 2:48,72 total    

(The first is with cold cache.)
msg9916 (view) Author: dagit Date: 2010-02-01.21:00:58
On Mon, Feb 1, 2010 at 11:47 AM, Petr Ročkai <bugs@darcs.net> wrote:

>
> Petr Ročkai <me@mornfall.net> added the comment:
>
> Sorry, can't reproduce. Please double-check your setup, provide --exact-
> version, &c. Thanks.
>

I don't think the darcs I have supports reporting dependency versions in
--exact-version (I thought that was added to head but not in the release
branch).  Having said that, I was careful to select specific versions of
hashed-storage when building:
http://lists.osuosl.org/pipermail/darcs-users/2010-January/022978.html

Unless cabal ignored the --constraint we should be good to go.  I have had
cabal ignore constraints, but only constraints on base and there is a bug in
the cabal tracker for that issue.

No one else is seeing this?  Eric (or anyone else), do you have an OS X
machine where you could try to reproduce this?

Also, the run had produced this output:
http://lists.osuosl.org/pipermail/darcs-users/2010-January/022967.html

Which contains this gem:
"darcs-2.3.98.2 check [ghc-hashed]: 1... error: user error (darcs failed
with
error code 1"

Jason
Attachments
msg9920 (view) Author: reinier.lamers Date: 2010-02-02.08:56:19
2.3.98.2 depends on hashed-storage 0.4.5 so there is no way that it
could be using 0.4.4. It could be that your 2.3.98.1 was using 0.4.5,
but then we have another weird issue in its own right.

I'll have a short at reproducing this myself after work tonight.
Perhaps if some other people try, and no-one can reproduce, we can
judge that something unrelated must have gone wrong in Jason's
benchmark. I wouldn't conclude that from just one or two failures two
reproduce.
msg9927 (view) Author: kowey Date: 2010-02-02.14:36:48
On Mon, Feb 01, 2010 at 12:59:56 -0800, Jason Dagit wrote:
> No one else is seeing this?  Eric (or anyone else), do you have an OS X
> machine where you could try to reproduce this?

Alright, here are some steps I'm taking which may help others too.

I've used the new 'touch' matcher to pull in some patches on top of
darcs-2.3.98.1 and 2.3.98.2 respectively so that we have darcs
--exact-version reporting the version of hashed-storage being used.
I'll attach these as a bundle.

You can
  darcs get --context name-of-bundle
and
  darcs apply name-of-bundle

To get a similar version of darcs 2.3.98.2, you'll want to darcs get
http://darcs.net/releases/darcs-2.4 --tag 2.3.98.2 and apply this
bundle, minus the one JUNK patch

Anyway, with these versions of darcs, I can confirm that I'm using
hashed-storage 0.4.4 and 0.4.5 respectively

Numbers to follow...

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9928 (view) Author: kowey Date: 2010-02-02.14:38:18
On Tue, Feb 02, 2010 at 14:34:41 +0000, Eric Kow wrote:
> I'll attach these as a bundle.

Sigh...

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
Attachments
msg9929 (view) Author: dagit Date: 2010-02-02.14:44:06
On Mon, Feb 1, 2010 at 11:47 AM, Petr Ročkai <bugs@darcs.net> wrote:

>
> Petr Ročkai <me@mornfall.net> added the comment:
>
> Sorry, can't reproduce. Please double-check your setup, provide --exact-
> version, &c. Thanks.
>
> 20:39:24 | morn@twi:~/dev/darcs/_test/hashed_ghc -> time darcs check
> The repository is consistent!
> darcs check  179,75s user 8,12s system 79% cpu 3:57,14 total
> 20:43:27 | morn@twi:~/dev/darcs/_test/hashed_ghc -> time darcs check
> The repository is consistent!
> darcs check  161,02s user 5,37s system 98% cpu 2:48,72 total
>

My attempt to reproduce it manually:

$ time darcs-2.3.98.2 check
The repository is consistent!

Hash mismatch(es)!

rts/hooks
    index: 029f75f696b3ff58a94ef54d847b22577998b6e02380e018fff30dc185ff6f0f
  working: b7baab3486b8ecea41b68bc7d0d8ff89574b01c8fd8ce854a7818f4fc6ea2e80
rts
    index: 67ab3c6e932c2c5ae514f0c52a46860f901b6dc522493faa4acce1f71133c9d8
  working: 3842af546a584f34aedf2df78e4205d1622f536db92bd049feb2ad12b52d26e6
bindisttest/a/b
    index: 0000000000000000000000000000000000000000000000000000000000000000
  working: 5847c8b50a3191775d91cbaf8029fd1a75725517b0c70dde5bff759fc67ec1f8
bindisttest/a
    index: 0000000000000000000000000000000000000000000000000000000000000000
  working: e9f80a49790bc926a3de5542274ae979753a3aedaef1b9fab0fcb74a21bc7c60
bindisttest
    index: 3c99d780272456ac9e4ab10056ee3e145700260cefee61921bf89017a15a6535
  working: 5a72b14295811905378a648e98da91922bbc69f21c1a209357834fc2e92da3f8

Bad index.

real 22m53.944s
user 4m25.249s
sys 1m40.398s
Attachments
msg9930 (view) Author: kowey Date: 2010-02-02.15:46:22
On Tue, Feb 02, 2010 at 06:43:02 -0800, Jason Dagit wrote:
> My attempt to reproduce it manually:

I can reproduce this!

> $ time darcs-2.3.98.2 check
> The repository is consistent!
> 
> Hash mismatch(es)!
> 
> rts/hooks
>     index: 029f75f696b3ff58a94ef54d847b22577998b6e02380e018fff30dc185ff6f0f
>   working: b7baab3486b8ecea41b68bc7d0d8ff89574b01c8fd8ce854a7818f4fc6ea2e80
> rts
>     index: 67ab3c6e932c2c5ae514f0c52a46860f901b6dc522493faa4acce1f71133c9d8
>   working: 3842af546a584f34aedf2df78e4205d1622f536db92bd049feb2ad12b52d26e6
> bindisttest/a/b
>     index: 0000000000000000000000000000000000000000000000000000000000000000
>   working: 5847c8b50a3191775d91cbaf8029fd1a75725517b0c70dde5bff759fc67ec1f8
> bindisttest/a
>     index: 0000000000000000000000000000000000000000000000000000000000000000
>   working: e9f80a49790bc926a3de5542274ae979753a3aedaef1b9fab0fcb74a21bc7c60
> bindisttest
>     index: 3c99d780272456ac9e4ab10056ee3e145700260cefee61921bf89017a15a6535
>   working: 5a72b14295811905378a648e98da91922bbc69f21c1a209357834fc2e92da3f8
> 
> Bad index.

Looks like we need to make a new bug of this.  I have a patched darcs
2.3.98.2 with the Cabal --exact-version patches, so I know this is with
hashed-storage 0.4.5

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9938 (view) Author: kowey Date: 2010-02-02.22:33:58
On Mon, Feb 01, 2010 at 12:26:50 +0000, Reinier Lamers wrote:
>                    || darcs-2.2.0         | darcs-2.2.1         | darcs-2.3.1        | darcs-2.3.98.1        | darcs-2.3.98.2       
> ===================++=====================+=====================+====================+=======================+======================
>              check ||      178.5s 1201.0M |      180.9s 1372.0M |      153.4s 145.0M |              -      - |              -      -
>             repair ||      174.0s 1204.0M |      178.2s 1207.0M |      125.8s 146.0M |        1180.7s 354.0M |        1082.7s 355.0M

Here are my numbers:

ghc-hashed
----------

+--------------------++-----------------------+-----------------------+
|                    || darcs-2.3.98.1        | darcs-2.3.98.2        |
+====================++=======================+=======================+
|         get (full) ||         721.0s  71.0M |         610.2s  70.0M |
|    get (lazy, x10) ||         121.0s   8.0M |         140.7s   8.0M |
|           pull 100 ||          14.0s  39.0M |          17.2s  37.0M |
|           annotate ||              -      - |              -      - |
|             wh x50 ||           5.6s   0.0M |           4.8s   0.0M |
|         wh mod x50 ||          11.3s   3.0M |          13.9s   3.0M |
|          wh -l x20 ||           4.7s   0.0M |           5.8s   0.0M |
|     record mod x10 ||          33.4s  53.0M |          43.4s  35.0M |
|     revert mod x50 ||          32.5s   8.0M |          26.9s   8.0M |
| (un)revert mod x10 ||          38.0s  48.0M |          33.3s  32.0M |
|              check ||              -      - |              -      - |
|             repair ||        2053.0s 353.0M |        1646.2s 355.0M |
|          pull 1000 ||          75.5s 195.0M |          92.7s 195.0M |
+--------------------++-----------------------+-----------------------+

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9939 (view) Author: dagit Date: 2010-02-03.02:02:01
On Tue, Feb 2, 2010 at 2:33 PM, Eric Kow <kowey@darcs.net> wrote:

> On Mon, Feb 01, 2010 at 12:26:50 +0000, Reinier Lamers wrote:
> >                    || darcs-2.2.0         | darcs-2.2.1         |
> darcs-2.3.1        | darcs-2.3.98.1        | darcs-2.3.98.2
> >
> ===================++=====================+=====================+====================+=======================+======================
> >              check ||      178.5s 1201.0M |      180.9s 1372.0M |
>  153.4s 145.0M |              -      - |              -      -
> >             repair ||      174.0s 1204.0M |      178.2s 1207.0M |
>  125.8s 146.0M |        1180.7s 354.0M |        1082.7s 355.0M
>
> Here are my numbers:
>
> ghc-hashed
> ----------
>
> +--------------------++-----------------------+-----------------------+
> |                    || darcs-2.3.98.1        | darcs-2.3.98.2        |
> +====================++=======================+=======================+
> |             wh x50 ||           5.6s   0.0M |           4.8s   0.0M |
> |          wh -l x20 ||           4.7s   0.0M |           5.8s   0.0M |
>

These entries that take 0.0M seem a little fishy to me.  For comparison,
here is the memory usage of a program, main = print "Hello, World!":

./helloworld +RTS -s
"Hello, World!"
           8,976 bytes allocated in the heap
             700 bytes copied during GC
           2,808 bytes maximum residency (1 sample(s))
          13,576 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

So anything that takes less than 1MB raises a red flag.  Either the
reporting of darcs-benchmark is doing something tricky or those commands
didn't run?

Jason
Attachments
msg9940 (view) Author: tux_rocker Date: 2010-02-03.08:31:31
2010/2/2 Eric Kow <kowey@darcs.net>:
> Here are my numbers:
>
> ghc-hashed
> ----------
>
> +--------------------++-----------------------+-----------------------+
> |                    || darcs-2.3.98.1        | darcs-2.3.98.2        |
> +====================++=======================+=======================+
> |              check ||              -      - |              -      - |
> |             repair ||        2053.0s 353.0M |        1646.2s 355.0M |
> +--------------------++-----------------------+-----------------------+

It would be especially interesting to see the difference between 2.3.1
and the 2.4 betas. For me, beta 3 is only two times slower, while for
Jason it is like eight times slower. The regression I see is
acceptable, the one Jason sees is not.

Jason, can you make a profile of your slow 'repair' run of 2.4 beta 3
on ghc-hashed?
msg9942 (view) Author: kowey Date: 2010-02-03.12:10:03
On Wed, Feb 03, 2010 at 08:31:34 +0000, Reinier Lamers wrote:
> > +====================++=======================+=======================+
> > |              check ||              -      - |              -      - |
> > |             repair ||        2053.0s 353.0M |        1646.2s 355.0M |
> > +--------------------++-----------------------+-----------------------+
> 
> It would be especially interesting to see the difference between 2.3.1
> and the 2.4 betas. For me, beta 3 is only two times slower, while for
> Jason it is like eight times slower. The regression I see is
> acceptable, the one Jason sees is not.

ghc-hashed
----------

+--------------------++--------------------+-----------------------+
|                    || darcs-2.3.1        | darcs-2.3.98.2        |
+====================++====================+=======================+
|         get (full) ||      584.9s  73.0M |         653.4s  70.0M |
|    get (lazy, x10) ||      131.9s   8.0M |         125.1s   8.0M |
|           pull 100 ||       19.3s  34.0M |          16.3s  40.0M |
|           annotate ||           -      - |              -      - |
|             wh x50 ||        2.2s   0.0M |           5.6s   0.0M |
|         wh mod x50 ||        7.6s   3.0M |          12.0s   3.0M |
|          wh -l x20 ||        8.8s   0.0M |           4.1s   0.0M |
|     record mod x10 ||       35.2s  55.0M |          30.9s  50.0M |
|     revert mod x50 ||       45.9s  14.0M |          23.3s   8.0M |
| (un)revert mod x10 ||       44.7s  33.0M |          27.6s  32.0M |
|              check ||      507.6s 142.0M |              -      - |
|             repair ||      394.4s 143.0M |        1356.6s 355.0M |
|          pull 1000 ||       90.2s 202.0M |         116.0s 195.0M |
+--------------------++--------------------+-----------------------+

I ran just the most recent beta.  Note that the 300s gap between my
first and second runs of darcs-2.3.98.2.

I think I need to code code up a flag to darcs-becnhmark that lets us
run just a set of benchmarks

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9943 (view) Author: mornfall Date: 2010-02-03.13:55:24
First, please keep this issue a little focussed. Dragging in other 
issues with benchmarking is not helping (i.e. the memory numbers are not 
pertinent to the check/repair times).

Second, please everyone make sure you are NEVER using profiling-enabled 
darcs for benchmarking. On my system, enabling profiling at *compile* 
time causes about 2-fold slowdown.

For ghc_hashed benchmark repo:

darcs-prof check  358,75s user 7,13s system 99% cpu 6:07,86 total
darcs check  175,88s user 5,59s system 98% cpu 3:04,27 total

Thanks.
msg9949 (view) Author: kowey Date: 2010-02-04.00:50:48
While Petr tracks down a memory leak, I thought I should post some new numbers.

It looks like the recent hashed-storage 0.4.6 fixes the darcs check error,
which as Jason correctly surmised, was simply darcs-benchmark reacting to the
check failing.

darcs-benchmark run darcs-2.3.1 darcs-2.3.98.1 darcs-2.3.98.2 darcs-2.3.99.1p1 / ghc-hashed --only check --only repair

ghc-hashed
----------

+--------++---------------+----------------+-----------------+------------------+
|        || darcs-2.3.1   | darcs-2.3.98.1 | darcs-2.3.98.2  | darcs-2.3.99.1p1 |
+========++===============+================+=================+==================+
|  check || 384.2s 142.0M |       -      - |        -      - |   1214.8s 355.0M |
| repair || 393.8s 142.0M | 1427.3s 353.0M |   951.7s 356.0M |    974.8s 358.0M |
+--------++---------------+----------------+-----------------+------------------+

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9950 (view) Author: tux_rocker Date: 2010-02-04.07:34:03
Some new numbers from me, comparing darcs 2.3.1 to darcs 2.4 beta 3. The
difference with the previous numbers is that this time the darcs 2.3.1
binary is not profiling-impaired.

I am still not able to reproduce Jason's observed regression.

=== ghc-hashed ===

+-----------------++-------------------------------------------+------------------------------------------+
|                 || darcs-2.3.1/dist/build/darcs/darcs        |
branch-2.4/dist/build/darcs/darcs        |
+=================++===========================================+==========================================+
|      get (full) ||                              16.7s  70.0M |       
                     16.8s  70.0M |
| get (lazy, x10) ||                              24.5s   9.0M |       
                     20.1s   9.0M |
|        pull 100 ||                              18.5s  59.0M |       
                     15.2s  51.0M |
|        annotate ||                                  -      - |       
                         -      - |
|          wh x50 ||                               3.6s   0.0M |       
                      7.0s   0.0M |
|      wh mod x50 ||                              10.2s   4.0M |       
                     12.9s   4.0M |
|       wh -l x20 ||                              11.9s  18.0M |       
                      5.8s   4.0M |
|           check ||                             538.9s 166.0M |       
                         -      - |
|          repair ||                             538.4s 166.0M |       
                    613.0s 420.0M |
|       pull 1000 ||                              84.8s 252.0M |       
                     81.6s 252.0M |
+-----------------++-------------------------------------------+------------------------------------------+
msg9951 (view) Author: tux_rocker Date: 2010-02-04.07:38:36
And don't those numbers for repair in Jason's benchmark look
suspiciously quick? What machine is able to repair the ghc repo in 125
seconds?

Perhaps there's some other issue going on there.
msg9952 (view) Author: kowey Date: 2010-02-04.09:09:30
On Thu, Feb 04, 2010 at 07:38:38 +0000, Reinier Lamers wrote:
> And don't those numbers for repair in Jason's benchmark look
> suspiciously quick? What machine is able to repair the ghc repo in 125
> seconds?

In case you're still up, Jason, what are the specs on your Mac?

I'm not entirely sure what's relevant, but I have a 2006 Macbook with:
 - 2 GHZ Intel Core Duo (ie the 32 bit one)
 - 2 GB RAM (667 MHz)
 - 7200 RPM SATA drive

> Perhaps there's some other issue going on there.

As for me,

Notice the variations in the numbers I've published on this thread
check/repair.  I don't think these are due to profiling -- I've only had
library profiling enabled the whole time.

Hmm, a 700s difference between the best and the worst run.  That sounds
unusual. My current going explanation is pure Erician stupidity (in that
I most likely wasn't being very disciplined about not doing anything
else on my machine but benchmarking at the times that gave us really
outlandish numbers.  In the future, I'll be sure to shut everything else
down.  Sorry for the noise!

repair
------
darcs 2.3.1      :  394.4s 143.0M
darcs 2.3.1      :  393.8s 142.0M

darcs 2.3.98.1   : 2053.0s 353.0M 
darcs 2.3.98.1   : 1427.3s 353.0M

darcs 2.3.98.2   : 1646.2s 355.0M
darcs 2.3.98.2   : 1356.6s 355.0M
darcs 2.3.98.2   :  951.7s 356.0M

darcs 2.3.99.1p1 :  974.8s 358.0M

check
-----
darcs 2.3.1: 507.6s 142.0M
darcs 2.3.1: 384.2s 142.0M
darcs 2.3.1: 360.8s 145.0M

darcs 2.3.99.1p1 : 1214.8s 355.0M
darcs 2.3.99.1p1 : 1003.2s 356.0M

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9953 (view) Author: mornfall Date: 2010-02-04.09:57:15
Ok, I have been tracking this down most of yesterday. This came down to a couple 
issues, namely syncing too often and when that was fixed, a somewhat nasty 
memory leak. I have fixed up both in hashed-storage HEAD, but currently dare not 
release a version to hackage, lest I break something. So I'd like to get some 
coverage and maybe review first.

Please pull and build from http://repos.mornfall.net/hashed-storage and run your 
benchmarks against latest darcs beta (ideally with my check-fixing-patch as 
well).

I am running benchmarks against 2.3 as I write this, so I'll report later about 
that.
msg9955 (view) Author: mornfall Date: 2010-02-04.10:56:53
=== darcs ===   
                
                || darcs-2.3        | darcs
================++==================+=============
     get (full) ||      2.3s  11.0M |  2.3s  10.0M
get (lazy, x10) ||      2.5s   3.0M |  2.8s   3.0M
       pull 100 ||      1.6s  25.0M |  1.5s  27.0M
       annotate ||     13.7s 181.0M | 14.9s 182.0M
         wh x50 ||      0.4s   0.0M |  1.1s   0.0M
     wh mod x50 ||      2.8s   2.0M |  3.4s   2.0M
      wh -l x20 ||      1.8s   5.0M |  0.9s   2.0M
          check ||      9.0s  27.0M | 11.6s  27.0M
         repair ||      9.0s  26.0M | 11.6s  27.0M
      pull 1000 ||      9.0s  40.0M |  8.2s  37.0M

=== ghc-hashed ===
                
                || darcs-2.3        |  darcs
================++==================+==============
     get (full) ||     10.5s  71.0M |  10.1s  71.0M
get (lazy, x10) ||      9.5s   7.0M |  10.3s   8.0M
       pull 100 ||      7.0s  36.0M |   6.4s  65.0M
       annotate ||         -      - |      -      -
         wh x50 ||      0.8s   0.0M |   2.4s   0.0M
     wh mod x50 ||      3.8s   3.0M |   5.3s   3.0M
      wh -l x20 ||      4.8s   9.0M |   2.2s   3.0M
          check ||    121.5s 152.0M | 121.8s 169.0M
         repair ||    121.2s 152.0M | 121.3s 177.0M
      pull 1000 ||     27.4s 201.0M |  25.2s 197.0M

Please note that check/repair in HEAD do more work than in 2.3 (that is, 
they also check the index which was not done in previous versions). So I 
think this is more than reasonable fix for the speed regression.

Now the question stands what about whatsnew... (but that's for another 
bug, unless people observe that hashed-storage HEAD is substantially 
slower on whatsnew than 0.4.6, that is).
msg9956 (view) Author: kowey Date: 2010-02-05.09:01:58
First: Jason reports that he had a refurb Macbook from 2008 with 4GB of RAM.

Second: Petr, I think you'll like this recent crop of MacOS X numbers. 
Hopefully roundup won't mess up these tables.  The darcs-2.3.99.1p2 is
the release candidate plus Petr's recent patch to compare against
working dir, built against head hashed-storage.

% darcs-benchmark run darcs-2.3.1 darcs-2.3.98.2 darcs-2.3.99.1p2 /
ghc-hashed --only check --only repair

+--------++--------------------+-----------------------+-------------------------+
|        || darcs-2.3.1        | darcs-2.3.98.2        |
darcs-2.3.99.1p2        |
+========++====================+=======================+=========================+
|  check ||      374.9s 142.0M |              -      - |          
354.5s 205.0M |
| repair ||      383.1s 142.0M |        1022.2s 358.0M |          
348.9s 203.0M |
+--------++--------------------+-----------------------+-------------------------+

I'll now do a full run with all benchmarks and report back tonight.
msg9957 (view) Author: kowey Date: 2010-02-05.09:10:34
On Fri, Feb 05, 2010 at 09:02:06 +0000, Eric Kow wrote:
> +--------++--------------------+-----------------------+-------------------------+
> |        || darcs-2.3.1        | darcs-2.3.98.2        |
> darcs-2.3.99.1p2        |

Feh... /me shakes his angry fists at roundup (or our current config)

ghc-hashed
----------

+--------++--------------------+-----------------------+-------------------------+
|        || darcs-2.3.1        | darcs-2.3.98.2        | darcs-2.3.99.1p2        |
+========++====================+=======================+=========================+
|  check ||      374.9s 142.0M |              -      - |           354.5s 205.0M |
| repair ||      383.1s 142.0M |        1022.2s 358.0M |           348.9s 203.0M |
+--------++--------------------+-----------------------+-------------------------+


-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9958 (view) Author: kowey Date: 2010-02-05.21:32:15
This time, the full set of benchmarks:

ghc-hashed
----------

+--------------------++--------------------+-----------------------+-------------------------+
|                    || darcs-2.3.1        | darcs-2.3.98.2        | darcs-2.3.99.1p2        |
+====================++====================+=======================+=========================+
|         get (full) ||      642.7s  70.0M |         748.1s  70.0M |           613.5s  70.0M |
|    get (lazy, x10) ||      142.3s   8.0M |         136.5s   8.0M |           134.5s   8.0M |
|           pull 100 ||       15.6s  34.0M |          13.1s  32.0M |            16.9s  32.0M |
|           annotate ||           -      - |              -      - |                -      - |
|             wh x50 ||        2.0s   0.0M |           4.7s   0.0M |             3.9s   0.0M |
|         wh mod x50 ||        7.2s   3.0M |           9.9s   3.0M |             8.3s   4.0M |
|          wh -l x20 ||        8.5s   0.0M |           4.6s   0.0M |             3.5s   0.0M |
|     record mod x10 ||       31.8s  55.0M |          31.2s  68.0M |            28.4s  68.0M |
|     revert mod x50 ||       38.4s  12.0M |          21.4s   8.0M |            21.7s   8.0M |
| (un)revert mod x10 ||       35.6s  33.0M |          28.7s  32.0M |            27.2s  32.0M |
|              check ||      478.9s 142.0M |              -      - |           427.7s 203.0M |
|             repair ||      650.2s 143.0M |        1067.7s 355.0M |           577.9s 204.0M |
|          pull 1000 ||       71.1s 202.0M |          77.2s 195.0M |           102.8s 203.0M |
+--------------------++--------------------+-----------------------+-------------------------+

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg9959 (view) Author: quick Date: 2010-02-06.15:03:14
For comparison, my local repo.  The last column is the latest hashed-
storage from Petr's repo.

=== per1 ===

                || darcs2.3.1        | darcs.net        | 
darcs-2.3.99.1        | darcs-2.3.99.1-hs+       
================++===================+==================
+=======================+==========================
     get (full) ||      50.9s 159.0M |     48.1s 159.0M |          
48.0s 159.0M |              53.3s 159.0M
get (lazy, x10) ||     165.2s   7.0M |    170.6s   7.0M |         
178.5s   6.0M |             173.5s   6.0M
       pull 100 ||      25.0s  23.0M |      7.9s  23.0M |           
7.5s  23.0M |               7.9s  23.0M
       annotate ||          -      - |         -      - |              
-      - |                  -      -
         wh x50 ||      12.1s   0.0M |     13.5s   0.0M |          
13.4s   0.0M |              12.5s   0.0M
     wh mod x50 ||      12.2s   6.0M |     14.0s   4.0M |          
13.8s   4.0M |              12.4s   4.0M
      wh -l x20 ||     566.1s 127.0M |     18.5s   6.0M |          
18.2s   6.0M |              16.5s   5.0M
          check ||      56.5s 275.0M |    142.2s 411.0M |         
113.2s 408.0M |              60.2s 287.0M
         repair ||      55.4s 275.0M |    138.8s 389.0M |         
110.1s 386.0M |              59.7s 287.0M
      pull 1000 ||      69.0s  84.0M |     63.1s  85.0M |          
62.0s  88.0M |              62.0s  88.0M
msg10215 (view) Author: tux_rocker Date: 2010-03-15.11:00:50
The following patch updated the status of issue1731 to be resolved:

* resolve issue1731: bump hashed-storage dependency to 0.4.5 
Ignore-this: c6e300b3c56d85b425306530b4d6c7be
History
Date User Action Args
2010-01-15 17:24:49koweycreate
2010-01-15 17:26:36koweysetmessages: + msg9826
2010-01-16 19:08:29dagitsetfiles: + darcs-2.4-beta-prof.prof
nosy: + dagit
messages: + msg9829
2010-01-16 21:04:25tux_rockersetnosy: + tux_rocker
messages: + msg9830
2010-01-16 23:46:53dagitsetfiles: + unnamed
messages: + msg9831
2010-01-19 02:21:44mornfallsetstatus: needs-reproduction -> waiting-for
nosy: + mornfall
messages: + msg9859
2010-01-22 11:56:18koweysetstatus: waiting-for -> needs-reproduction
assignedto: dagit
messages: + msg9890
2010-01-22 16:09:10dagitsetfiles: + unnamed
messages: + msg9895
2010-01-22 20:13:09tux_rockersetmessages: + msg9896
2010-01-22 21:24:26dagitsetfiles: + unnamed
messages: + msg9897
2010-01-22 23:11:54koweysetmessages: + msg9898
2010-01-26 16:24:25tux_rockersetstatus: needs-reproduction -> resolved
messages: + msg9903
2010-02-01 12:26:50tux_rockersetstatus: resolved -> needs-reproduction
assignedto: dagit -> mornfall
messages: + msg9913
2010-02-01 19:47:51mornfallsetstatus: needs-reproduction -> waiting-for
messages: + msg9915
2010-02-01 21:01:01dagitsetfiles: + unnamed
messages: + msg9916
2010-02-02 08:56:22reinier.lamerssetnosy: + reinier.lamers
messages: + msg9920
2010-02-02 14:36:51koweysetmessages: + msg9927
2010-02-02 14:38:20koweysetfiles: + include-the-exact-package-versions-darcs-was-compiled-with-in-__exact_version_.dpatch
messages: + msg9928
2010-02-02 14:44:08dagitsetfiles: + unnamed
messages: + msg9929
2010-02-02 15:46:24koweysetmessages: + msg9930
2010-02-02 22:34:02koweysetmessages: + msg9938
2010-02-03 02:02:16dagitsetfiles: + unnamed
messages: + msg9939
2010-02-03 08:26:03tux_rockersetnosy: - reinier.lamers
2010-02-03 08:31:34tux_rockersetmessages: + msg9940
2010-02-03 12:10:09koweysetmessages: + msg9942
2010-02-03 13:55:27mornfallsetmessages: + msg9943
2010-02-04 00:50:51koweysetmessages: + msg9949
2010-02-04 07:34:06tux_rockersetmessages: + msg9950
2010-02-04 07:38:38tux_rockersetmessages: + msg9951
2010-02-04 09:09:36koweysetmessages: + msg9952
2010-02-04 09:57:17mornfallsetmessages: + msg9953
2010-02-04 10:56:55mornfallsetmessages: + msg9955
2010-02-05 09:02:06koweysetstatus: waiting-for -> has-patch
messages: + msg9956
2010-02-05 09:10:39koweysetmessages: + msg9957
2010-02-05 21:32:18koweysetmessages: + msg9958
2010-02-06 15:03:17quicksetnosy: + quick
messages: + msg9959
2010-03-15 11:00:55tux_rockersetstatus: has-patch -> resolved
messages: + msg10215