darcs

Patch 337 handle whitespace lexing more intelligen... (and 16 more)

Title handle whitespace lexing more intelligen... (and 16 more)
Superseder Nosy List dagit, ganesh, kowey, mornfall
Related Issues
Status accepted Assigned To ganesh
Milestone

Created on 2010-08-10.02:57:12 by dagit, last changed 2011-05-10.21:06:10 by darcswatch. Tracked on DarcsWatch.

Files
File name Status Uploaded Type Edit Remove
handle-whitespace-lexing-more-intelligently-in-read_hs.dpatch dagit, 2010-08-10.02:57:12 text/x-darcs-patch
improve-interface-of-peekforw-and-bracketedfl.dpatch dagit, 2010-08-11.06:20:14 text/x-darcs-patch
improve-interface-of-peekforw-and-bracketedfl.dpatch dagit, 2010-08-13.06:16:44 text/x-darcs-patch
unnamed dagit, 2010-08-10.02:57:12
unnamed dagit, 2010-08-11.06:20:14
unnamed dagit, 2010-08-13.06:16:44
See mailing list archives for discussion on individual patches.
Messages
msg12079 (view) Author: dagit Date: 2010-08-10.02:57:12
Hello,

This is my attempt to improve the performance now that my refactor was
accepted.

The majority of these patches happened as a natural part of me staring
at the code thinking of how to optimize it.  Which is to say, if the
code was ugly and I had to stare at it I fet compelled to tidy it up.

This refactor comes with really only 3 optimizations:

Sun Aug  8 19:04:46 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up skipNewline

Sun Aug  8 19:04:57 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up linesStartingWith

Sun Aug  8 19:05:39 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * optimize char parser

The main one is the last one, "optimize char parser".  What I can't
tell is how much this stuff helps.

As for why is my new parser slower than the old one?  I tracked one
culprit down to the removal of maybeWork.  That patch changed the
linesStartingWith parser among other things so I've included what I
hope is a faster version of that function.  I removed the reversed
list building in favor of killing the tail call optimization.

Is there a way to know if this was a bad idea?  On the surface, it
avoids one O(n) operation, namely reverse.  It also removes the tail
recursion opportunity, but Haskell is a lazy language so it's hard to
predict, in my opinion, what the outcome will be.  My benchmarking
show that it took just as much memory either way and the runtime
didn't seem to change either.

In summary, take the three patches mentioned above for the performance
win, and take the rest if you're okay with it.

Thanks,
Jason

17 patches for repository http://darcs.net:

Sun Aug  8 13:05:33 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * handle whitespace lexing more intelligently in Read.hs

Sun Aug  8 13:11:40 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * improve interface of peekforw and bracketedFL

Sun Aug  8 16:34:24 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up readPatchInfo

Sun Aug  8 16:34:41 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * improve string parser

Sun Aug  8 17:18:09 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unused word8 parser

Sun Aug  8 17:18:50 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up anyChar parser

Sun Aug  8 19:04:15 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * reorder alternatives in readPrim to optimize for the critical path

Sun Aug  8 19:04:46 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up skipNewline

Sun Aug  8 19:04:57 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up linesStartingWith

Sun Aug  8 19:05:39 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * optimize char parser

Sun Aug  8 19:05:47 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unneeded export

Sun Aug  8 19:15:22 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up readPatches

Sun Aug  8 19:21:15 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unused import

Sun Aug  8 19:21:45 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up peekforc

Sun Aug  8 19:34:45 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * fix the haddocks in ReadMonads.hs

Sun Aug  8 19:44:22 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up Real.hs and refactor space lexing

Sun Aug  8 20:28:21 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up Info.hs
  The comment about break no longer applies as break is no longer used.
  Using choice instead of (<|>) keeps linesStartingWithEndingWith a
  little prettier.
Attachments
msg12081 (view) Author: dagit Date: 2010-08-10.03:04:52
What I said about the improvement patches has a minor typo.  I forgot 
that: 
Sun Aug  8 13:05:33 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * handle whitespace lexing more intelligently in Read.hs

Also makes an important difference, along with the refactor to the string 
parser.
msg12083 (view) Author: mornfall Date: 2010-08-10.11:16:35
Hi,

Jason Dagit <bugs@darcs.net> writes:
> This is my attempt to improve the performance now that my refactor was
> accepted.
>
> The majority of these patches happened as a natural part of me staring
> at the code thinking of how to optimize it.  Which is to say, if the
> code was ugly and I had to stare at it I fet compelled to tidy it up.

I am glad that you are working on this. However, there may be still some
work left, since with this bundle applied, we are getting failures in
the testsuite.

You can see the output at
eg. http://buildbot.darcs.net/builders/6.10.4%20Debian%20TRY/builds/8/steps/test/logs/stdio

also, it seems (but I am not sure, I haven't studied the numbers closely
enough) that the refactor without this bundle performs better than after
this bundle. Well, actually I would say that the performance is quite
varied -- it seems that some things got faster and other things got
slower. Maybe depending on the shape of the patches that appear in the
given benchmark. I wish there was a better way to measure this...

Yours,
   Petr.
msg12093 (view) Author: kowey Date: 2010-08-10.18:15:31
Just marking Petr as reviewer for this if that's OK.
msg12108 (view) Author: dagit Date: 2010-08-11.06:20:14
I found the mistake.  When I left factored the white space parsing in
Read.hs I missed one spot.  There are 3 calls to readMerger, two of
which are not inside readPrim.  I had only added the skipSpace call to
the one in readPrim.  This patch bundle fixes that.

Jason

15 patches for repository http://darcs.net:

Sun Aug  8 13:11:40 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * improve interface of peekforw and bracketedFL

Sun Aug  8 16:34:24 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up readPatchInfo

Sun Aug  8 17:18:09 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unused word8 parser

Sun Aug  8 17:18:50 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up anyChar parser

Sun Aug  8 19:04:46 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up skipNewline

Sun Aug  8 19:04:57 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up linesStartingWith

Sun Aug  8 19:05:39 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * optimize char parser

Sun Aug  8 19:05:47 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unneeded export

Sun Aug  8 19:15:22 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up readPatches

Sun Aug  8 19:21:15 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unused import

Sun Aug  8 19:21:45 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up peekforc

Sun Aug  8 19:44:22 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up Real.hs and refactor space lexing

Sun Aug  8 20:28:21 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up Info.hs
  The comment about break no longer applies as break is no longer used.
  Using choice instead of (<|>) keeps linesStartingWithEndingWith a
  little prettier.

Tue Aug 10 23:10:27 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * handle whitespace lexing more intelligently in Read.hs

Tue Aug 10 23:12:08 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * ReadMonad.hs: fix haddocks
Attachments
msg12149 (view) Author: dagit Date: 2010-08-13.06:16:44
This should set the performance of the refactored parser back on par
with the previous parser.

Everything else in this email is just story and context for interested
parties, with evidence about the performance.

Petr, do I have your blessing that the new parser is now 'fast enough'
and I may work on other things without risking you being grumpy about
the performance at 2.6 release time?

Thanks,
Jason

---------------------------------------------------------------------

The main hurdles:
  * linesStartingWith/linesStartingWithEndingWith are huge bottle
  necks.
  * Profiling did not show the above two functions as bottle necks
  because my refactor of them was built on top of the parser
  primitives, so those showed up in the profiles instead.
  * (>>=) for the parser is fast, but when it's called billions of
  times it starts to look like a bottle neck.

Once I realized that linesStartingWith was the real culprit, and
admitted to myself that it needed access to the parser's internal
state inorder to be implemented efficiently, then I was able to get
the performance of this parser to match the previous darcs parser.

I made extensive use of a synthetic patch, constructed thusly: 

  * I looked at the gziped patches under _darcs and picked one that
    had a good mixture of patch types and long hunks.
  * I copied the body of the patch (everything not including the
    patchinfo at the top) and repeatedly pasted it until the patch was
    22megs in size (uncompressed size).

Next I used criterion to benchmark the reading of this 22mb patch
file.

You can get the code I use for benchmarking from here:
  darcs get --lazy http://code.haskell.org/~dagit/darcs-parser

Then you hack the cabal of the darcs you want to benchmark with to be
version 2.4.98.2, build it and install it.  If cabal gives a warning
about incompatible versions of parsec, ignore it.

When you run the benchmark you'll get nice output like this (this is
after applying the attached patch bundle):

benchmarking readPatch/{}
collecting 100 samples, 1 iterations each, in estimated 27.50721 s
bootstrapping with 100000 resamples
mean: 221.4402 ms, lb 217.0538 ms, ub 226.1490 ms, ci 0.950
std dev: 23.41066 ms, lb 21.00892 ms, ub 27.47362 ms, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers

The interesting bit is the mean of 221ms.  The darcs parser without my
changes had this output:

benchmarking readPatch/{}
collecting 100 samples, 1 iterations each, in estimated 25.47681 s
bootstrapping with 100000 resamples
mean: 203.9187 ms, lb 199.1081 ms, ub 209.0355 ms, ci 0.950
std dev: 25.38274 ms, lb 22.85717 ms, ub 31.75149 ms, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers

Here the mean is a litte better, 203ms.  But, keep in mind that's only
a 2 second difference over 2200mb of patch data!  The difference in
mean is also less than the std dev of either version.  Compare that to
the refactored parser BEFORE the fixes in this bundle:

benchmarking readPatch/{}
collecting 100 samples, 1 iterations each, in estimated 83.97970 s
bootstrapping with 100000 resamples
mean: 803.7558 ms, lb 798.7546 ms, ub 808.7250 ms, ci 0.950
std dev: 25.64119 ms, lb 22.36137 ms, ub 29.52504 ms, ci 0.950
variance introduced by outliers: 0.998%
variance is unaffected by outliers

I haven't included any more darcs-benchmark numbers, but I think this
stress test is far more compelling for this particular line of work.

16 patches for repository http://darcs.net:

Sun Aug  8 13:11:40 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * improve interface of peekforw and bracketedFL

Sun Aug  8 16:34:24 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up readPatchInfo

Sun Aug  8 17:18:09 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unused word8 parser

Sun Aug  8 17:18:50 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up anyChar parser

Sun Aug  8 19:04:46 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up skipNewline

Sun Aug  8 19:04:57 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up linesStartingWith

Sun Aug  8 19:05:39 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * optimize char parser

Sun Aug  8 19:05:47 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unneeded export

Sun Aug  8 19:15:22 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up readPatches

Sun Aug  8 19:21:15 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * remove unused import

Sun Aug  8 19:21:45 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up peekforc

Sun Aug  8 19:44:22 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up Real.hs and refactor space lexing

Sun Aug  8 20:28:21 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * tidy up Info.hs
  The comment about break no longer applies as break is no longer used.
  Using choice instead of (<|>) keeps linesStartingWithEndingWith a
  little prettier.

Tue Aug 10 23:10:27 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * handle whitespace lexing more intelligently in Read.hs

Tue Aug 10 23:12:08 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * ReadMonad.hs: fix haddocks

Thu Aug 12 22:54:54 PDT 2010  Jason Dagit <dagit@codersbase.com>
  * dramatically improve the parser performance
  This puts the parser's performance back to where it was before the
  move to a Parsec-like API.  The cruxt of the fix was to implement
  linesStartingWith as a primitive instead of implementing it using
  other parser primitives.  Several other tricks are included, such as
  using strict tuples that are partially specialized.
Attachments
msg12152 (view) Author: mornfall Date: 2010-08-13.09:42:48
Hi,

Jason Dagit <bugs@darcs.net> writes:
> This should set the performance of the refactored parser back on par
> with the previous parser.
>
> Everything else in this email is just story and context for interested
> parties, with evidence about the performance.
>
> Petr, do I have your blessing that the new parser is now 'fast enough'
> and I may work on other things without risking you being grumpy about
> the performance at 2.6 release time?
yes, you have my blessing (I am trusting you on the numbers, as I am
swamped with other work, and won't have time to look at it properly for
maybe couple more days). Thanks for working on this. :)

Yours,
   Petr.
msg12240 (view) Author: dagit Date: 2010-08-19.23:33:51
It's been almost a week since there was an update on this ticket.

Does the reviewer need to be reassigned for load balancing purposes?

Thanks,
Jason
msg12248 (view) Author: dagit Date: 2010-08-22.07:27:23
Reassigning this.  Petr appears to be busy.
msg12250 (view) Author: dagit Date: 2010-08-22.11:04:00
Assigning this to Eric.  Ganesh said he's probably too busy to review it.
msg12255 (view) Author: ganesh Date: 2010-08-22.13:23:53
Actually I'm fine to review right now, there's just a risk of me 
disappearing for a while suddenly so assigning me reviews is a bit 
dicey.

Anyway, I've reviewed this and it looks fine so I'm pushing it.

One strong suggestion for a followup is to rename the new 'State' type 
to 'ParserState' or similar to avoid confusion with the state monad.


A minor style point I noticed:

> +         choice [ do ls <- lswew
> +                     return (l:ls)

(which itself is a refactoring from an explicit bind) could be expressed 
with <$> or fmap instead  - choice [(l:) <$> lswew, ...
msg12256 (view) Author: darcswatch Date: 2010-08-22.13:51:30
This patch bundle (with 15 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5
msg12257 (view) Author: darcswatch Date: 2010-08-22.13:51:32
This patch bundle (with 16 patches) was just applied to the repository http://darcs.net/.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932
msg14096 (view) Author: darcswatch Date: 2011-05-10.18:06:22
This patch bundle (with 16 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932
msg14240 (view) Author: darcswatch Date: 2011-05-10.20:06:04
This patch bundle (with 15 patches) was just applied to the repository http://darcs.net/reviewed.
This message was brought to you by DarcsWatch
http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5
History
Date User Action Args
2010-08-10 02:57:12dagitcreate
2010-08-10 02:59:12darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-300d5b7dde46a8d99592af5338920a377b9e88f7
2010-08-10 03:04:52dagitsetmessages: + msg12081
2010-08-10 11:16:36mornfallsetnosy: + mornfall
messages: + msg12083
2010-08-10 18:15:31koweysetassignedto: mornfall
messages: + msg12093
nosy: + kowey
2010-08-11 06:20:14dagitsetfiles: + improve-interface-of-peekforw-and-bracketedfl.dpatch, unnamed
messages: + msg12108
2010-08-11 06:21:54darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-300d5b7dde46a8d99592af5338920a377b9e88f7 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5
2010-08-13 06:16:44dagitsetfiles: + improve-interface-of-peekforw-and-bracketedfl.dpatch, unnamed
messages: + msg12149
2010-08-13 06:18:40darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-35efac5a3691d873c8c760e6660b7d88ece8bda5 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932
2010-08-13 09:42:48mornfallsetmessages: + msg12152
2010-08-19 23:33:51dagitsetmessages: + msg12240
2010-08-22 07:27:23dagitsetassignedto: mornfall -> ganesh
messages: + msg12248
nosy: + ganesh
2010-08-22 11:04:00dagitsetassignedto: ganesh -> kowey
messages: + msg12250
2010-08-22 13:23:53ganeshsetstatus: needs-review -> accepted
assignedto: kowey -> ganesh
messages: + msg12255
2010-08-22 13:51:30darcswatchsetmessages: + msg12256
2010-08-22 13:51:32darcswatchsetmessages: + msg12257
2011-05-10 18:06:22darcswatchsetmessages: + msg14096
2011-05-10 20:06:04darcswatchsetmessages: + msg14240
2011-05-10 21:06:10darcswatchsetdarcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-ffe1ab0b429a194cae90674036b87d27eb4f7932 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-300d5b7dde46a8d99592af5338920a377b9e88f7