Created on 2010-08-10.02:57:12 by dagit, last changed 2011-05-10.21:06:10 by darcswatch. Tracked on DarcsWatch.
See mailing list archives
for discussion on individual patches.
msg12079 (view) |
Author: dagit |
Date: 2010-08-10.02:57:12 |
|
Hello,
This is my attempt to improve the performance now that my refactor was
accepted.
The majority of these patches happened as a natural part of me staring
at the code thinking of how to optimize it. Which is to say, if the
code was ugly and I had to stare at it I fet compelled to tidy it up.
This refactor comes with really only 3 optimizations:
Sun Aug 8 19:04:46 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up skipNewline
Sun Aug 8 19:04:57 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up linesStartingWith
Sun Aug 8 19:05:39 PDT 2010 Jason Dagit <dagit@codersbase.com>
* optimize char parser
The main one is the last one, "optimize char parser". What I can't
tell is how much this stuff helps.
As for why is my new parser slower than the old one? I tracked one
culprit down to the removal of maybeWork. That patch changed the
linesStartingWith parser among other things so I've included what I
hope is a faster version of that function. I removed the reversed
list building in favor of killing the tail call optimization.
Is there a way to know if this was a bad idea? On the surface, it
avoids one O(n) operation, namely reverse. It also removes the tail
recursion opportunity, but Haskell is a lazy language so it's hard to
predict, in my opinion, what the outcome will be. My benchmarking
show that it took just as much memory either way and the runtime
didn't seem to change either.
In summary, take the three patches mentioned above for the performance
win, and take the rest if you're okay with it.
Thanks,
Jason
17 patches for repository http://darcs.net:
Sun Aug 8 13:05:33 PDT 2010 Jason Dagit <dagit@codersbase.com>
* handle whitespace lexing more intelligently in Read.hs
Sun Aug 8 13:11:40 PDT 2010 Jason Dagit <dagit@codersbase.com>
* improve interface of peekforw and bracketedFL
Sun Aug 8 16:34:24 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up readPatchInfo
Sun Aug 8 16:34:41 PDT 2010 Jason Dagit <dagit@codersbase.com>
* improve string parser
Sun Aug 8 17:18:09 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unused word8 parser
Sun Aug 8 17:18:50 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up anyChar parser
Sun Aug 8 19:04:15 PDT 2010 Jason Dagit <dagit@codersbase.com>
* reorder alternatives in readPrim to optimize for the critical path
Sun Aug 8 19:04:46 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up skipNewline
Sun Aug 8 19:04:57 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up linesStartingWith
Sun Aug 8 19:05:39 PDT 2010 Jason Dagit <dagit@codersbase.com>
* optimize char parser
Sun Aug 8 19:05:47 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unneeded export
Sun Aug 8 19:15:22 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up readPatches
Sun Aug 8 19:21:15 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unused import
Sun Aug 8 19:21:45 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up peekforc
Sun Aug 8 19:34:45 PDT 2010 Jason Dagit <dagit@codersbase.com>
* fix the haddocks in ReadMonads.hs
Sun Aug 8 19:44:22 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up Real.hs and refactor space lexing
Sun Aug 8 20:28:21 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up Info.hs
The comment about break no longer applies as break is no longer used.
Using choice instead of (<|>) keeps linesStartingWithEndingWith a
little prettier.
Attachments
|
msg12081 (view) |
Author: dagit |
Date: 2010-08-10.03:04:52 |
|
What I said about the improvement patches has a minor typo. I forgot
that:
Sun Aug 8 13:05:33 PDT 2010 Jason Dagit <dagit@codersbase.com>
* handle whitespace lexing more intelligently in Read.hs
Also makes an important difference, along with the refactor to the string
parser.
|
msg12083 (view) |
Author: mornfall |
Date: 2010-08-10.11:16:35 |
|
Hi,
Jason Dagit <bugs@darcs.net> writes:
> This is my attempt to improve the performance now that my refactor was
> accepted.
>
> The majority of these patches happened as a natural part of me staring
> at the code thinking of how to optimize it. Which is to say, if the
> code was ugly and I had to stare at it I fet compelled to tidy it up.
I am glad that you are working on this. However, there may be still some
work left, since with this bundle applied, we are getting failures in
the testsuite.
You can see the output at
eg. http://buildbot.darcs.net/builders/6.10.4%20Debian%20TRY/builds/8/steps/test/logs/stdio
also, it seems (but I am not sure, I haven't studied the numbers closely
enough) that the refactor without this bundle performs better than after
this bundle. Well, actually I would say that the performance is quite
varied -- it seems that some things got faster and other things got
slower. Maybe depending on the shape of the patches that appear in the
given benchmark. I wish there was a better way to measure this...
Yours,
Petr.
|
msg12093 (view) |
Author: kowey |
Date: 2010-08-10.18:15:31 |
|
Just marking Petr as reviewer for this if that's OK.
|
msg12108 (view) |
Author: dagit |
Date: 2010-08-11.06:20:14 |
|
I found the mistake. When I left factored the white space parsing in
Read.hs I missed one spot. There are 3 calls to readMerger, two of
which are not inside readPrim. I had only added the skipSpace call to
the one in readPrim. This patch bundle fixes that.
Jason
15 patches for repository http://darcs.net:
Sun Aug 8 13:11:40 PDT 2010 Jason Dagit <dagit@codersbase.com>
* improve interface of peekforw and bracketedFL
Sun Aug 8 16:34:24 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up readPatchInfo
Sun Aug 8 17:18:09 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unused word8 parser
Sun Aug 8 17:18:50 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up anyChar parser
Sun Aug 8 19:04:46 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up skipNewline
Sun Aug 8 19:04:57 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up linesStartingWith
Sun Aug 8 19:05:39 PDT 2010 Jason Dagit <dagit@codersbase.com>
* optimize char parser
Sun Aug 8 19:05:47 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unneeded export
Sun Aug 8 19:15:22 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up readPatches
Sun Aug 8 19:21:15 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unused import
Sun Aug 8 19:21:45 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up peekforc
Sun Aug 8 19:44:22 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up Real.hs and refactor space lexing
Sun Aug 8 20:28:21 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up Info.hs
The comment about break no longer applies as break is no longer used.
Using choice instead of (<|>) keeps linesStartingWithEndingWith a
little prettier.
Tue Aug 10 23:10:27 PDT 2010 Jason Dagit <dagit@codersbase.com>
* handle whitespace lexing more intelligently in Read.hs
Tue Aug 10 23:12:08 PDT 2010 Jason Dagit <dagit@codersbase.com>
* ReadMonad.hs: fix haddocks
Attachments
|
msg12149 (view) |
Author: dagit |
Date: 2010-08-13.06:16:44 |
|
This should set the performance of the refactored parser back on par
with the previous parser.
Everything else in this email is just story and context for interested
parties, with evidence about the performance.
Petr, do I have your blessing that the new parser is now 'fast enough'
and I may work on other things without risking you being grumpy about
the performance at 2.6 release time?
Thanks,
Jason
---------------------------------------------------------------------
The main hurdles:
* linesStartingWith/linesStartingWithEndingWith are huge bottle
necks.
* Profiling did not show the above two functions as bottle necks
because my refactor of them was built on top of the parser
primitives, so those showed up in the profiles instead.
* (>>=) for the parser is fast, but when it's called billions of
times it starts to look like a bottle neck.
Once I realized that linesStartingWith was the real culprit, and
admitted to myself that it needed access to the parser's internal
state inorder to be implemented efficiently, then I was able to get
the performance of this parser to match the previous darcs parser.
I made extensive use of a synthetic patch, constructed thusly:
* I looked at the gziped patches under _darcs and picked one that
had a good mixture of patch types and long hunks.
* I copied the body of the patch (everything not including the
patchinfo at the top) and repeatedly pasted it until the patch was
22megs in size (uncompressed size).
Next I used criterion to benchmark the reading of this 22mb patch
file.
You can get the code I use for benchmarking from here:
darcs get --lazy http://code.haskell.org/~dagit/darcs-parser
Then you hack the cabal of the darcs you want to benchmark with to be
version 2.4.98.2, build it and install it. If cabal gives a warning
about incompatible versions of parsec, ignore it.
When you run the benchmark you'll get nice output like this (this is
after applying the attached patch bundle):
benchmarking readPatch/{}
collecting 100 samples, 1 iterations each, in estimated 27.50721 s
bootstrapping with 100000 resamples
mean: 221.4402 ms, lb 217.0538 ms, ub 226.1490 ms, ci 0.950
std dev: 23.41066 ms, lb 21.00892 ms, ub 27.47362 ms, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers
The interesting bit is the mean of 221ms. The darcs parser without my
changes had this output:
benchmarking readPatch/{}
collecting 100 samples, 1 iterations each, in estimated 25.47681 s
bootstrapping with 100000 resamples
mean: 203.9187 ms, lb 199.1081 ms, ub 209.0355 ms, ci 0.950
std dev: 25.38274 ms, lb 22.85717 ms, ub 31.75149 ms, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers
Here the mean is a litte better, 203ms. But, keep in mind that's only
a 2 second difference over 2200mb of patch data! The difference in
mean is also less than the std dev of either version. Compare that to
the refactored parser BEFORE the fixes in this bundle:
benchmarking readPatch/{}
collecting 100 samples, 1 iterations each, in estimated 83.97970 s
bootstrapping with 100000 resamples
mean: 803.7558 ms, lb 798.7546 ms, ub 808.7250 ms, ci 0.950
std dev: 25.64119 ms, lb 22.36137 ms, ub 29.52504 ms, ci 0.950
variance introduced by outliers: 0.998%
variance is unaffected by outliers
I haven't included any more darcs-benchmark numbers, but I think this
stress test is far more compelling for this particular line of work.
16 patches for repository http://darcs.net:
Sun Aug 8 13:11:40 PDT 2010 Jason Dagit <dagit@codersbase.com>
* improve interface of peekforw and bracketedFL
Sun Aug 8 16:34:24 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up readPatchInfo
Sun Aug 8 17:18:09 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unused word8 parser
Sun Aug 8 17:18:50 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up anyChar parser
Sun Aug 8 19:04:46 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up skipNewline
Sun Aug 8 19:04:57 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up linesStartingWith
Sun Aug 8 19:05:39 PDT 2010 Jason Dagit <dagit@codersbase.com>
* optimize char parser
Sun Aug 8 19:05:47 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unneeded export
Sun Aug 8 19:15:22 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up readPatches
Sun Aug 8 19:21:15 PDT 2010 Jason Dagit <dagit@codersbase.com>
* remove unused import
Sun Aug 8 19:21:45 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up peekforc
Sun Aug 8 19:44:22 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up Real.hs and refactor space lexing
Sun Aug 8 20:28:21 PDT 2010 Jason Dagit <dagit@codersbase.com>
* tidy up Info.hs
The comment about break no longer applies as break is no longer used.
Using choice instead of (<|>) keeps linesStartingWithEndingWith a
little prettier.
Tue Aug 10 23:10:27 PDT 2010 Jason Dagit <dagit@codersbase.com>
* handle whitespace lexing more intelligently in Read.hs
Tue Aug 10 23:12:08 PDT 2010 Jason Dagit <dagit@codersbase.com>
* ReadMonad.hs: fix haddocks
Thu Aug 12 22:54:54 PDT 2010 Jason Dagit <dagit@codersbase.com>
* dramatically improve the parser performance
This puts the parser's performance back to where it was before the
move to a Parsec-like API. The cruxt of the fix was to implement
linesStartingWith as a primitive instead of implementing it using
other parser primitives. Several other tricks are included, such as
using strict tuples that are partially specialized.
Attachments
|
msg12152 (view) |
Author: mornfall |
Date: 2010-08-13.09:42:48 |
|
Hi,
Jason Dagit <bugs@darcs.net> writes:
> This should set the performance of the refactored parser back on par
> with the previous parser.
>
> Everything else in this email is just story and context for interested
> parties, with evidence about the performance.
>
> Petr, do I have your blessing that the new parser is now 'fast enough'
> and I may work on other things without risking you being grumpy about
> the performance at 2.6 release time?
yes, you have my blessing (I am trusting you on the numbers, as I am
swamped with other work, and won't have time to look at it properly for
maybe couple more days). Thanks for working on this. :)
Yours,
Petr.
|
msg12240 (view) |
Author: dagit |
Date: 2010-08-19.23:33:51 |
|
It's been almost a week since there was an update on this ticket.
Does the reviewer need to be reassigned for load balancing purposes?
Thanks,
Jason
|
msg12248 (view) |
Author: dagit |
Date: 2010-08-22.07:27:23 |
|
Reassigning this. Petr appears to be busy.
|
msg12250 (view) |
Author: dagit |
Date: 2010-08-22.11:04:00 |
|
Assigning this to Eric. Ganesh said he's probably too busy to review it.
|
msg12255 (view) |
Author: ganesh |
Date: 2010-08-22.13:23:53 |
|
Actually I'm fine to review right now, there's just a risk of me
disappearing for a while suddenly so assigning me reviews is a bit
dicey.
Anyway, I've reviewed this and it looks fine so I'm pushing it.
One strong suggestion for a followup is to rename the new 'State' type
to 'ParserState' or similar to avoid confusion with the state monad.
A minor style point I noticed:
> + choice [ do ls <- lswew
> + return (l:ls)
(which itself is a refactoring from an explicit bind) could be expressed
with <$> or fmap instead - choice [(l:) <$> lswew, ...
|
msg12256 (view) |
Author: darcswatch |
Date: 2010-08-22.13:51:30 |
|
This patch bundle (with 15 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5
|
msg12257 (view) |
Author: darcswatch |
Date: 2010-08-22.13:51:32 |
|
This patch bundle (with 16 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932
|
msg14096 (view) |
Author: darcswatch |
Date: 2011-05-10.18:06:22 |
|
This patch bundle (with 16 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932
|
msg14240 (view) |
Author: darcswatch |
Date: 2011-05-10.20:06:04 |
|
This patch bundle (with 15 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5
|
|
Date |
User |
Action |
Args |
2010-08-10 02:57:12 | dagit | create | |
2010-08-10 02:59:12 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-300d5b7dde46a8d99592af5338920a377b9e88f7 |
2010-08-10 03:04:52 | dagit | set | messages:
+ msg12081 |
2010-08-10 11:16:36 | mornfall | set | nosy:
+ mornfall messages:
+ msg12083 |
2010-08-10 18:15:31 | kowey | set | assignedto: mornfall messages:
+ msg12093 nosy:
+ kowey |
2010-08-11 06:20:14 | dagit | set | files:
+ improve-interface-of-peekforw-and-bracketedfl.dpatch, unnamed messages:
+ msg12108 |
2010-08-11 06:21:54 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-300d5b7dde46a8d99592af5338920a377b9e88f7 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5 |
2010-08-13 06:16:44 | dagit | set | files:
+ improve-interface-of-peekforw-and-bracketedfl.dpatch, unnamed messages:
+ msg12149 |
2010-08-13 06:18:40 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932 |
2010-08-13 09:42:48 | mornfall | set | messages:
+ msg12152 |
2010-08-19 23:33:51 | dagit | set | messages:
+ msg12240 |
2010-08-22 07:27:23 | dagit | set | assignedto: mornfall -> ganesh messages:
+ msg12248 nosy:
+ ganesh |
2010-08-22 11:04:00 | dagit | set | assignedto: ganesh -> kowey messages:
+ msg12250 |
2010-08-22 13:23:53 | ganesh | set | status: needs-review -> accepted assignedto: kowey -> ganesh messages:
+ msg12255 |
2010-08-22 13:51:30 | darcswatch | set | messages:
+ msg12256 |
2010-08-22 13:51:32 | darcswatch | set | messages:
+ msg12257 |
2011-05-10 18:06:22 | darcswatch | set | messages:
+ msg14096 |
2011-05-10 20:06:04 | darcswatch | set | messages:
+ msg14240 |
2011-05-10 21:06:10 | darcswatch | set | darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-300d5b7dde46a8d99592af5338920a377b9e88f7 |
|