darcs

Issue 973 darcs whatsnew significantly slower on hashed format (ghc 6.8.2 bug affecting Debian and Ubuntu)

Title darcs whatsnew significantly slower on hashed format (ghc 6.8.2 bug affecting Debian and Ubuntu)
Priority urgent Status resolved
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, ertai, gwern, kowey, markstos, mornfall, simonmar, thorkilnaur, tommy, zooko
Assigned To mornfall
Topics Performance

Created on 2008-07-25.10:10:25 by kowey, last changed 2009-10-24.00:05:57 by admin.

Messages
msg5264 (view) Author: kowey Date: 2008-07-25.10:10:21
I haven't seen a ticket on this yet.

Simon M mentions that the hashed format is slow enough for darcs
whatsnew that he has switched back to the original format.

Presumably, if we fix whatever makes the hashed format significantly
slower, this will also translate to gains for the darcs-2 format.  It
would be useful if somebody would check and see if how the hashed
format compares with darcs-2.

If only there was more work going on with the maybench ([darcs]
benchmarking) project...
msg5266 (view) Author: kowey Date: 2008-07-25.10:13:42
Adding relevant people and marking urgent.  

Darcs-2 performance is a big barrier to adoption for people with biggish
repositories, which is bad because biggish repositories are especially
susceptible to the kinds of problems that darcs fixes.  Plus, the hashed format
thing should be a clue, eh?
msg5269 (view) Author: ertai Date: 2008-07-25.13:25:41
I remember that some "whatsnew" optimizations was harder to make with the      
                                                                               
                                                                               
hashed format.                                                                 
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
> If only there was more work going on with the maybench ([darcs]
> benchmarking) project...

My bad, partly...
msg5283 (view) Author: droundy Date: 2008-07-30.17:19:48
I haven't looked into this, and don't have time to look into it, but it's
probably just a strictness bug.

David
msg5304 (view) Author: mornfall Date: 2008-08-07.09:54:43
I have some possible insight on this issue. On systems using GHC 6.8.2 (not all
of them, but many -- including all Debian x86 installs from packages),
getSymbolicLinkStatus is utterly broken, returning bogus mtimes, which completely
kills any mtime-based optimisation. This means we have been basically running
with --ignore-times here... The difference is very significant on a mostly clean
working tree and much more so for hashed repository, as the slurpHashed there
computes sha256 of every file in pristine (through fetchFileUsingCachePrivate),
unlike the old-fashioned one.
msg5319 (view) Author: markstos Date: 2008-08-07.14:17:12
Given mornfall's comment, can we get anyone to re-test with 6.8.3 or newer as a
first step? 

I suppose a direct benchmark between 6.8.2 and a newer GHC would be especially
useful.
msg5340 (view) Author: mornfall Date: 2008-08-08.23:37:29
Ok, after fixing that mtime borkage, times on my repository (2200 files or so) go
like .5s with working mtime and 3.5 with non-working. However, on ghc repo, it
doesn't help, and now I found why: ghc-hashedrepo, as sitting on
darcs.haskell.org, is in some weird intermediate (not final) hashed format that
doesn't keep (original) filesizes in filenames. So in effect, it's running like
with --ignore-times still, because the HashedIO slurpy gives unknown_size and
smart_diff then decides it better should do a full diff, which it does. Which is
slow. Duh.

I am downlnoading ghc-HEAD-2008-07-24-ghc.tar.bz2 from darcs.haskell.org to
convert it to hashed myself tomorrow (it's going at 10K/s so no chance to do it
before passing out today).
msg5348 (view) Author: mornfall Date: 2008-08-09.11:27:30
13:10:46 | xroc@ann:~/dev/public/ghc-hashed -> time pdarcs-2.0.2-nolink wh -s --
No changes!
pdarcs-2.0.2-nolink wh -s --ignore-times  1,59s user 0,25s system 100% cpu 1,839
total
13:10:48 | xroc@ann:~/dev/public/ghc-hashed -> time pdarcs-2.0.2-nolink wh -s      
No changes!
pdarcs-2.0.2-nolink wh -s  0,26s user 0,06s system 99% cpu 0,320 total


13:11:33 | xroc@ann:~/dev/public/ghc -> time pdarcs-2.0.2-nolink wh -s --ignore-
No changes!
pdarcs-2.0.2-nolink wh -s --ignore-times  0,38s user 0,14s system 99% cpu 0,525
total
13:11:34 | xroc@ann:~/dev/public/ghc -> time pdarcs-2.0.2-nolink wh -s 
No changes!                             
pdarcs-2.0.2-nolink wh -s  0,04s user 0,04s system 97% cpu 0,086 total

Hmm, it still is a fair bit slower. From a profile, the old-fashioned version
takes .06 secs of cpu time, which isn't much more than git, really. It could
possibly be optimized further nevertheless.

Looking at the hashed profile, I see
slh                            Darcs.Repository.HashedIO  30.8   18.1
(this is the HashedIO slurpy)


and:
              readroot   Darcs.Repository.HashedIO                           
946         630   0.0    0.1    46.2   65.3
               readdir   Darcs.Repository.HashedIO                           
952        2417   0.0    1.7    38.5   64.5
                unpackPS FastPackedString                                   
1042        1208  15.4   42.4    15.4   42.4

and atop of that

             slh         Darcs.Repository.HashedIO                           
945        6603  30.8   18.1    84.6   90.8

So basically all the extra time (90 % of those 13 20ms ticks) is spent in
HashedIO Slurpy (well, not really that surprising, is it?). Now that unpackPS
figure could be addressed by using PackedString for the hashes (instead of
String), although it does seem to propagate quite far (it might even be
responsible for other bits of slowness, though -- the Hashed path does a fair bit
of string fiddling it seems).
msg5378 (view) Author: mornfall Date: 2008-08-11.15:22:10
I think we need to address some things here:
1) try to ensure we are not using broken ghc to build darcs -- and warn about it
if we do (alternatively, never use getSymbolicLinkStatus to get mtime)
2) warn users about obsolete repository formats (no size in hashed filename)
3) improve the actual performance of whatsnew on hashed, which is still .32
seconds compared to 0.09 or so (it is fairly bearable I think, but could use
improving)
msg5387 (view) Author: mornfall Date: 2008-08-11.23:39:51
For the bogus getSymbolicLinkStatus mtime problem (which by the way also makes
fileSize results bogus), see http://hackage.haskell.org/trac/ghc/ticket/2093

It has been fixed for 6.8.3 the ticket says, so we should probably just blacklist
6.8.2 for building darcs? Note that 6.8.2 is what Debian Lenny will release with,
so we are sort of screwed on that front. I have no idea about status of other
major distributions.

Igloo has suggested that it should be possible to work around that problem on
darcs side by foreign-importing lstat ourselves.
msg5395 (view) Author: kowey Date: 2008-08-12.09:31:05
Well, do we do have a Workaround.hs mechanism that we could use to implement our
own getSymbolicLinkStatus (Ian's solution).  Is there a way to detect buggy
getSymbolicLinkStatus directly (as opposed to checking GHC version?)

In any case, we should probably enter it into darcs folklore that compiling with
GHC 6.8.3 is way preferable to 6.8.2, for performance reasons.  We would need a
comparison, though... especially under Windows.
msg5396 (view) Author: mornfall Date: 2008-08-12.09:39:36
I actually don't think this applies to Windows at all. The comparison goes as
3.5s vs .5s on a medium-sized repository (a little bigger than ghc I think, at
least to number of files -- smaller in number of patches). For whatsnew, that is.
I think that's a good enough comparison -- 600% speedup?
msg5511 (view) Author: gwern Date: 2008-08-14.17:34:38
I recently converted a up-to-date GHC repo to darcs-2 format. Here's a timing
with the original:

gwern@craft:16593~/bin/ghc>time =darcs query repo                              
                                      [ 1:30PM]
          Type: darcs
          Show: Repo "/home/gwern/bin/ghc" [FixFilePath "/home/gwern/bin/ghc"
"",Verbose] (RF [["darcs-1.0"]]) (DarcsRepository (PlainPristine
"_darcs/pristine") thisrepo:/home/gwern/bin/ghc
                cache:/home/gwern/.darcs/cache
                )
        Format: darcs-1.0
          Root: /home/gwern/bin/ghc
      Pristine: PlainPristine "_darcs/pristine"
         Cache: thisrepo:/home/gwern/bin/ghc, cache:/home/gwern/.darcs/cache
boringfile Pref: .darcs-boring
Default Remote: http://darcs.haskell.org/ghc
   Num Patches: 19263
=darcs query repo  0.05s user 0.02s system 32% cpu 0.209 total

gwern@craft:16589~/bin/ghc>time =darcs whatsnew                                
                                      [ 1:29PM]
diffing dir...
binary ./WindowsInstaller/Glasgow\32\Haskell\32\Compiler.ism
rmfile ./WindowsInstaller/Glasgow\32\Haskell\32\Compiler.ism
=darcs whatsnew  0.04s user 0.01s system 44% cpu 0.107 total

Okay, that's good. No one can complain about a 0.1s whatsnew.

On the converted repo:

gwern@craft:16597~/bin/ghc-darcs2>time =darcs query repo                       
                                      [ 1:31PM]
          Type: darcs                                
          Show: Repo "/home/gwern/bin/ghc-darcs2" [FixFilePath
"/home/gwern/bin/ghc-darcs2" "",Verbose] (RF [["hashed"],["darcs-2"]])
(DarcsRepository HashedPristine thisrepo:/home/gwern/bin/ghc-darcs2
                cache:/home/gwern/.darcs/cache
                )
        Format: hashed, darcs-2
          Root: /home/gwern/bin/ghc-darcs2
      Pristine: HashedPristine
         Cache: thisrepo:/home/gwern/bin/ghc-darcs2, cache:/home/gwern/.darcs/cache
   Num Patches: 19263
=darcs query repo  0.21s user 0.02s system 72% cpu 0.310 total

gwern@craft:16600~/bin/ghc-darcs2>time =darcs whatsnew                         
                                      [ 1:31PM]
diffing dir...
binary ./WindowsInstaller/Glasgow\32\Haskell\32\Compiler.ism
rmfile ./WindowsInstaller/Glasgow\32\Haskell\32\Compiler.ism
=darcs whatsnew  0.10s user 0.03s system 55% cpu 0.236 total

I would be more concerned about this doubling of time, if the base weren't so
small. :)

The important thing is that I'm on a 64-bit Ubuntu Heron, so I avoid the
mentioned file bug. I don't think Windows has anything to do with it; so I agree
with Mornfall here.

Given this major performance regression - and also the bug with mixing -profile
and -threaded - telling people to avoid 6.8.2 in favor of 6.8.3 is starting to
sound really good to me (as opposed to doing funky FFI stuff and also
complicating the autoconf stuff even further).
msg5879 (view) Author: markstos Date: 2008-09-01.21:48:01
I'm not sure how helpful this is, but I filled a GHC bug with Ubuntu regarding
6.8.2:

https://bugs.launchpad.net/ubuntu/+source/ghc6/+bug/263773
msg5883 (view) Author: markstos Date: 2008-09-02.18:48:12
Ubuntu is taking the GHC bug report seriously, but they are on the fence about
whether they can get it done in time for Intreprid, due to manpower. Is there
anyone from the darcs camp that could help? And could a Haskeller pass this on
to the Haskell community, since it affects all GHC uses, and not just darcs? 

You can follow the discussion to date here:
https://bugs.launchpad.net/ubuntu/+source/ghc6/+bug/263773

It doesn't sound like hard work, just a lot of it.
msg6168 (view) Author: mornfall Date: 2008-09-29.11:39:58
A version of ghc 6.8.2 with the relevant fix backported has been uploaded to
Debian unstable. Thanks Kari Pahula. We'll see what can be done about pushing
that fix through to Lenny...
msg6170 (view) Author: markstos Date: 2008-09-29.13:46:04
mornfall, 

Would you be able to test that binary generated for Ubuntu is fixed as well?

http://launchpadlibrarian.net/18008357/darcs_2.0.2-2ubuntu2_i386.deb

Their bug report is here:
https://bugs.launchpad.net/ubuntu/+source/ghc6/+bug/263773
msg6171 (view) Author: mornfall Date: 2008-09-29.14:28:13
Yes, I can confirm the Ubuntu deb being fixed. I am uploading fixed darcs to
Debian unstable in a little while as well.
msg6173 (view) Author: markstos Date: 2008-09-29.15:06:50
So this is ticket resolvable now, or is there another part to 
"darcs whatsnew significantly slower on hashed format" that is separate from the
ghc 6.8.2 bug?

Thanks to everyone who helped address this!
msg6178 (view) Author: markstos Date: 2008-09-30.13:46:12
> Why did they just take that one fix from 6.8.3?  There were a bunch of other 
> really important fixes in 6.8.3, and since you still have to recompile all the
> libraries anyway it would seem sensible to just move to 6.8.3.

Simon,

They explained this in the Ubuntu bug thread:

1. There is not time for a full re-build of everything that is effected before
the Intrepid release-- it was considered. 
2. It was determined that this one fix could be done without breaking the ABI,
and be done in time. 

https://bugs.launchpad.net/ubuntu/+source/ghc6/+bug/263773
msg6588 (view) Author: kowey Date: 2008-11-04.12:47:04
Simon Marlow : you reported this offhand, that hashed repositories were
significantly slower than old-fashioned ones.  The fix has been to compile darcs
with GHC 6.8.3.  I'm going to mark this as resolved.  If you notice this
behaviour still (perhaps under Windows? Petr says the bug/fix does not affect
Windows in any way), please open a new ticket
msg6592 (view) Author: markstos Date: 2008-11-04.15:50:08
I agree with the resolved status, but want to note that I think both Debian and
Ubuntu patched their versions of GHC 6.8.2 to quit having the related bug as well.
msg7319 (view) Author: kowey Date: 2009-02-15.18:46:31
Just noting from    
http://packages.debian.org/changelogs/pool/main/g/ghc6/ghc6_6.8.2dfsg1-1/changelog 
that the patched version of GHC 6.8.2 seems to have made it in indeed.  In
particular, this is closed: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=500407
msg7398 (view) Author: zooko Date: 2009-03-04.15:49:32
I guess this has been resolved by Debian and Ubuntu patching their ghc 6.8.2.
History
Date User Action Args
2008-07-25 10:10:25koweycreate
2008-07-25 10:11:00koweylinkissue972 superseder
2008-07-25 10:13:48koweysetpriority: urgent
nosy: + droundy, simonmar
topic: + Performance
status: unread -> unknown
messages: + msg5266
2008-07-25 13:25:44ertaisetnosy: + ertai
messages: + msg5269
2008-07-30 17:19:51droundysetnosy: droundy, tommy, beschmi, kowey, dagit, simonmar, ertai
messages: + msg5283
2008-08-07 09:54:47mornfallsetnosy: + mornfall
messages: + msg5304
2008-08-07 13:34:59markstossetnosy: + markstos
2008-08-07 14:17:15markstossetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5319
2008-08-08 23:37:33mornfallsetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5340
2008-08-09 11:27:34mornfallsetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5348
2008-08-11 15:22:13mornfallsetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5378
assignedto: mornfall
2008-08-11 23:39:54mornfallsetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5387
2008-08-12 09:31:08koweysetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5395
2008-08-12 09:39:39mornfallsetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simonmar, ertai, mornfall
messages: + msg5396
2008-08-12 16:36:45koweylinkissue986 superseder
2008-08-14 17:11:29gwernsetnosy: + darcs-devel, gwern
2008-08-14 17:34:40gwernsetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, gwern, ertai, mornfall
messages: + msg5511
2008-09-01 21:48:03markstossetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, gwern, ertai, mornfall
messages: + msg5879
2008-09-02 18:48:14markstossetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, gwern, ertai, mornfall
messages: + msg5883
2008-09-29 11:40:00mornfallsetnosy: + simon, thorkilnaur
messages: + msg6168
2008-09-29 13:46:06markstossetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, mornfall
messages: + msg6170
title: darcs whatsnew significantly slower on hashed format -> darcs whatsnew significantly slower on hashed format (ghc 6.8.2 bug affecting Debian and Ubuntu)
2008-09-29 14:28:14mornfallsetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, mornfall
messages: + msg6171
2008-09-29 15:06:51markstossetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, mornfall
messages: + msg6173
2008-09-30 13:46:15markstossetnosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, mornfall
messages: + msg6178
2008-11-04 12:47:06koweysetnosy: + dmitry.kurochkin
messages: + msg6588
2008-11-04 12:50:22koweysetstatus: unknown -> resolved
nosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, mornfall
2008-11-04 15:50:10markstossetstatus: resolved -> unknown
nosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, mornfall
messages: + msg6592
2008-11-04 16:52:39koweysetstatus: unknown -> resolved
nosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, mornfall
2009-02-15 18:46:34koweysetstatus: resolved -> unknown
nosy: droundy, tommy, beschmi, kowey, markstos, darcs-devel, dagit, simonmar, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, mornfall
messages: + msg7319
2009-02-15 20:21:28droundysetnosy: - darcs-devel
2009-03-04 15:49:35zookosetstatus: unknown -> resolved
nosy: + zooko
messages: + msg7398
2009-08-06 17:59:23adminsetnosy: + jast, Serware, darcs-devel, - droundy, simonmar, gwern, ertai
2009-08-06 21:10:02adminsetnosy: - beschmi
2009-08-10 22:21:55adminsetnosy: + gwern, simonmar, ertai, - darcs-devel, jast, Serware
2009-08-11 00:19:31adminsetnosy: - dagit
2009-08-25 17:29:47adminsetnosy: + darcs-devel, - simon
2009-08-27 14:15:54adminsetnosy: tommy, kowey, markstos, darcs-devel, zooko, simonmar, thorkilnaur, gwern, ertai, dmitry.kurochkin, mornfall
2009-10-23 22:38:29adminsetnosy: + marlowsd, - simonmar
2009-10-23 22:40:54adminsetnosy: + nicolas.pouillard, - ertai
2009-10-23 23:36:58adminsetnosy: + simonmar, - marlowsd
2009-10-24 00:05:57adminsetnosy: + ertai, - nicolas.pouillard