darcs

Issue 1075 pull of GHC HEAD into GHC STABLE takes a long time

Title pull of GHC HEAD into GHC STABLE takes a long time
Priority bug Status wont-fix
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, thorkilnaur
Assigned To
Topics Conflicts, Performance

Created on 2008-09-14.12:19:28 by thorkilnaur, last changed 2009-08-27.14:32:25 by admin.

Files
File name Uploaded Type Edit Remove
Crude_strobes_using_hpc.dpatch.gz thorkilnaur, 2008-09-14.12:19:25 application/x-gzip
strobeHpcDiff1.sh thorkilnaur, 2008-09-14.12:22:26 application/x-sh
strobeHpc_darcs_3600_SUB_3599_20080914_130546.tar.gz thorkilnaur, 2008-09-14.12:21:55 application/x-gzip
Messages
msg6009 (view) Author: thorkilnaur Date: 2008-09-14.12:19:25
This is in response to http://bugs.darcs.net/msg5703:

> Author: simonpj       Date: 2008-08-26.09:22:03
> | I tried to pull the GHC HEAD into this STABLE repository, both from http://darcs.haskell.org/ghc and
> | from a
> | local repository. In both cases, I interrupted the pull after it's having spent 100% CPU for more than
> | an
> | hour without apparent progress.
> |
> | So this seems to be a dead end.
>
> Not a dead end at all!  It's another darcs bug!  Having pull take 100% of the CPU for an hour is an absolute 
disaster 
when it comes to user confidence.  If you have a reproducible test case, I urge you to examine it.
>
> Simon

To examine this case more closely, I have used the basic machinery provided by hpc (Haskell Program Coverage). 
Briefly, I intrumented a recent darcs as indicated in the attached patch to print the hpc tix counters to a series 
of files regularly, once every second. This modified darcs also needed compiling with -fhpc (the ./configure flag -
-enable-hpc). By subtracting (hpc combine --function=SUB) two such sets of counters, one gets a set of counters 
representing the activity between the writing of the two files. Some additional postprocessing (hpc markup --
highlight-covered) then produces a set of .html files with the active parts of the source code highlighted.

The attached strobeHpc_darcs_3600_SUB_3599_20080914_130546.tar.gz file represents the result of examining darcs 
pulling GHC HEAD from a local repository into GHC STABLE in the last second before having executed for one whole 
hour. Only .html files representing darcs source files with actual activity have been included. Producing such a 
set of files from two .tix files is done by a bit of shell'ery (the attached strobeHpcDiff1.sh).

I have tried to produce the same set of .html files for the same run, but representing the last 10 minutes before 
the hour and they were identical to the ones of the last second. So I feel quite confident that the .html files 
represent faithfully what actually goes on.

It seems encouraging that apparently only a small fraction of the darcs code is active, but I have, myself, not 
investigated any further. It may be that the information in the .html files doesn't add to a case like this, but I 
suggest that somebody with knowledge about darcs internals take a look and judge. 

Best regards
Thorkil
Attachments
msg6012 (view) Author: kowey Date: 2008-09-15.07:16:04
This looks very interesting.  Something to pursue for darcs 2.1

On Sun, Sep 14, 2008 at 12:19:28 -0000, Thorkil Naur wrote:
> The attached strobeHpc_darcs_3600_SUB_3599_20080914_130546.tar.gz file represents the result of examining darcs 
> pulling GHC HEAD from a local repository into GHC STABLE in the last second before having executed for one whole 
> hour. Only .html files representing darcs source files with actual activity have been included. Producing such a 
> set of files from two .tix files is done by a bit of shell'ery (the attached strobeHpcDiff1.sh).

I think you may have forgotten the attachment.

Also, should your darcs patch go into darcs, or is it meant to be set
aside and only used with hpc?  Perhaps it could be enabled via some
ifdefs and then go into mainline darcs?
msg6013 (view) Author: thorkilnaur Date: 2008-09-15.08:33:58
Replying to http://bugs.darcs.net/msg6012 (Author: kowey Date: 2008-09-15.07:16:04):    

> I think you may have forgotten the attachment.

I attached one file when submitting the issue initially and two others in separate updates to the 
issue. I presently see all three files attached to the issue. Are you unable to access these files?

> 
> Also, should your darcs patch go into darcs, or is it meant to be set
> aside and only used with hpc?  Perhaps it could be enabled via some
> ifdefs and then go into mainline darcs?

I used a patch to describe the changes that I made to darcs because it was most convenient. In its 
present form, the patch is not intended to go into mainline darcs. A more polished version (allowing 
conditional compilation and, perhaps, setting the time interval via some flag) could be worked out, if 
useful.

Best regards
Thorkil
msg8069 (view) Author: kowey Date: 2009-08-10.15:35:20
Hi Thorkil,

I think it would be really useful for anybody studying this if we could have
tarballs of both repositories.  By any chance do you have something like this on
hand, or at least some recollection of what repos you downloaded?  

PS: For the interested, Thorkil's hpc-strobe is now on hackage at
http://hackage.haskell.org/package/hpc-strobe
msg8409 (view) Author: thorkilnaur Date: 2009-08-23.17:10:32
Hi Eric,

A while back, I did some additional experiments related to this issue, but since 
the experiments didn't really seem to gain me any additional insight, I never 
reported the results. Recall that the problem is a pull of a GHC HEAD repository 
of about August 2008 into a GHC STABLE repository of about September 2007 that 
takes a very long time, hours, perhaps forever. The experiments amounted to 
closing the gap between the source repository being pulled from and the target 
repository being pulled into, retrying the pull, to see if anything interesting 
turned up.

Both repositories are darcs-1 repositories, so it seemed likely, even when the 
issue was reported, that this was simply an instance of "the big conflicts bug", 
that "The time needed to resolve or detect conflicts grows exponentially with 
depth of conflict nesting" (http://wiki.darcs.net/ConflictsFAQDarcs1#what-the-
big-conflicts-bug). This seemed to be confirmed by the experiments, in that I 
found several cases where the time to pull the source into the target increased 
dramatically by just increasing their difference by a single patch.

To comply with your request, I have prepared a couple of repositories, 
http://thorkilnaur.dk/~tn/darcs/darcsIssue1075/darcs_Issue_1075_source.tar.gz 
and 
http://thorkilnaur.dk/~tn/darcs/darcsIssue1075/darcs_Issue_1075_target.tar.gz, 
to demonstrate this. Locally, these files look like this:

> tn@tn21$ ls -l
> total 154352
> -rw-r--r--  1 tn  tn  96415047 Aug 23 15:21 darcs_Issue_1075_source.tar.gz
> -rw-r--r--  1 tn  tn  61511823 Aug 23 15:24 darcs_Issue_1075_target.tar.gz
> tn@tn21$ md5 darcs_Issue_1075_*
> MD5 (darcs_Issue_1075_source.tar.gz) = 4cd96734907dc8e965caa779caeac4a5
> MD5 (darcs_Issue_1075_target.tar.gz) = 274d0f8dd628ea837abb9e664ed58b33
> tn@tn21$

Here is a transcript of a session that illustrates the issue:

> $ darcs --version
> 2.1.2 (release)
> $ wget 
http://thorkilnaur.dk/~tn/darcs/darcsIssue1075/darcs_Issue_1075_source.tar.gz
> ...
> $ md5 darcs_Issue_1075_source.tar.gz 
> MD5 (darcs_Issue_1075_source.tar.gz) = 4cd96734907dc8e965caa779caeac4a5
> $ tar xzf darcs_Issue_1075_source.tar.gz 
> $ darcs changes --last=2 --repodir=darcs_Issue_1075_source/ghc 
> Fri Dec 28 19:40:24 CET 2007  Clemens Fruhwirth <clemens@endorphin.org>
>   * Install dynlibs correctly
>   
>   Add dynlibdir target to config.mk.in, setting it to @libdir@.
>   Invoke installPackage with dynlibdir at libraries/Makefile
>   Make installPackage.hs hand dynlibdir to Cabal.
> 
> Fri Dec 28 18:57:27 CET 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
>   * import ord that alex secretly imported
> $ wget 
http://thorkilnaur.dk/~tn/darcs/darcsIssue1075/darcs_Issue_1075_target.tar.gz
> ...
> $ md5 darcs_Issue_1075_target.tar.gz 
> MD5 (darcs_Issue_1075_target.tar.gz) = 274d0f8dd628ea837abb9e664ed58b33
> $ tar xzf darcs_Issue_1075_target.tar.gz 
> $ darcs changes --last=2 --repodir=darcs_Issue_1075_target/ghc
> Sun Sep 16 02:53:28 CEST 2007  Ian Lynagh <igloo@earth.li>
>   * Another attempt at getting bindists working everywhere
> 
> Fri Sep 14 22:34:11 CEST 2007  Ian Lynagh <igloo@earth.li>
>   * MERGED: Better modelling of newtypes in the Term datatype
>   Pepe Iborra <mnislaih@gmail.com>**20070912165855
>   
>    This helps to get pretty printing right,
>    nested newtypes were not being shown correctly by :print
> $

Now pull the source into the target. This takes a long time:

> $ time darcs pull -a --repodir=darcs_Issue_1075_target/ghc 
darcs_Issue_1075_source/ghc 
> This is the GHC darcs repository (HEAD branch)
> 
> For more information, visit the GHC developer wiki at
>   http://hackage.haskell.org/trac/ghc
> **********************
> Backing up ./compiler/Makefile(-darcs-backup0)                                  
> Backing up ./compiler/main/InteractiveEval.hs(-darcs-backup0)
> Backing up ./compiler/ghci/Debugger.hs(-darcs-backup0)
> Backing up ./compiler/ghci/RtClosureInspect.hs(-darcs-backup0)
> Backing up ./configure.ac(-darcs-backup0)
> Backing up ./compiler/nativeGen/RegAllocColor.hs(-darcs-backup0)
> Backing up ./compiler/ghc-inplace.c(-darcs-backup0)
> Backing up ./libraries/Makefile(-darcs-backup0)
> Backing up ./Makefile(-darcs-backup0)
> Backing up ./libraries/installPackage.hs(-darcs-backup0)
> Backing up ./compiler/main/DynFlags.hs(-darcs-backup0)
> We have conflicts in the following files:
> ./Makefile ./compiler/Makefile ./compiler/ghc-inplace.c 
./compiler/ghci/Debugger.hs ./compiler/ghci/RtClosureInspect.hs 
./compiler/main/DynFlags.hs ./compiler/main/InteractiveEval.hs 
./compiler/nativeGen/RegAllocColor.hs ./compiler/utils/GraphColor.hs 
./configure.ac ./libraries/Makefile ./libraries/installPackage.hs
> Finished pulling and applying.
>                                                   
> real    44m19.336s
> user    43m14.772s
> sys     0m17.611s
> $

Now unpack the target again for a new attempt:

> $ rm -r darcs_Issue_1075_target
> $ tar xzf darcs_Issue_1075_target.tar.gz 

Before pulling the source into the target, unpull a single patch from the 
source:

> $ darcs unpull --last=1 --repodir=darcs_Issue_1075_source/ghc 
> Fri Dec 28 19:40:24 CET 2007  Clemens Fruhwirth <clemens@endorphin.org>
>   * Install dynlibs correctly
>   
>   Add dynlibdir target to config.mk.in, setting it to @libdir@.
>   Invoke installPackage with dynlibdir at libraries/Makefile
>   Make installPackage.hs hand dynlibdir to Cabal.
> Shall I unpull this patch? (1/1)  [ynWvpxdaqjk], or ? for help: y
> Finished unpulling.              
> $ darcs changes --last=2 --repodir=darcs_Issue_1075_source/ghc
> Fri Dec 28 18:57:27 CET 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
>   * import ord that alex secretly imported
> 
> Thu Dec 27 18:13:35 CET 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
>   * add missing import that happy -agc secretly provided
> $ time darcs pull -a --repodir=darcs_Issue_1075_target/ghc 
darcs_Issue_1075_source/ghc
> This is the GHC darcs repository (HEAD branch)
> 
> For more information, visit the GHC developer wiki at
>   http://hackage.haskell.org/trac/ghc
> **********************
> Backing up ./compiler/Makefile(-darcs-backup0)                                  
> Backing up ./compiler/main/InteractiveEval.hs(-darcs-backup0)
> Backing up ./compiler/ghci/Debugger.hs(-darcs-backup0)
> Backing up ./compiler/ghci/RtClosureInspect.hs(-darcs-backup0)
> Backing up ./configure.ac(-darcs-backup0)
> Backing up ./compiler/nativeGen/RegAllocColor.hs(-darcs-backup0)
> Backing up ./compiler/ghc-inplace.c(-darcs-backup0)
> Backing up ./libraries/Makefile(-darcs-backup0)
> Backing up ./Makefile(-darcs-backup0)
> Backing up ./libraries/installPackage.hs(-darcs-backup0)
> Backing up ./compiler/main/DynFlags.hs(-darcs-backup0)
> We have conflicts in the following files:
> ./Makefile ./compiler/Makefile ./compiler/ghc-inplace.c 
./compiler/ghci/Debugger.hs ./compiler/ghci/RtClosureInspect.hs 
./compiler/main/DynFlags.hs ./compiler/main/InteractiveEval.hs 
./compiler/nativeGen/RegAllocColor.hs ./compiler/utils/GraphColor.hs 
./configure.ac ./libraries/Makefile ./libraries/installPackage.hs
> Finished pulling and applying.                                                  
>                            
> real    2m22.360s
> user    2m3.731s
> sys     0m9.721s
> $ 

So much faster.

But honestly, I believe that we should close this bug, either as wontfix 
(because we are not fixing darcs-1 code) or as resolved (although this would 
assume that the problem has been tested as gone for darcs-2, something that I 
have not attempted).

Best regards
Thorkil
msg8410 (view) Author: thorkilnaur Date: 2009-08-23.17:18:35
Hi Eric,

Fiddle status and assigning to you, hopefully suitable.

Best regards
Thorkil
msg8412 (view) Author: kowey Date: 2009-08-23.17:35:01
Hi Thorkil: Just to make sure I understand this before we wont-fix this, do you
remember off-hand if GHC STABLE (target) was a subset of GHC HEAD (source)? 

The conflicts explanation would be easiest to understand if this was not the case.

It's OK if you don't remember; we'll just have take some time to find out otherwise.

(re-assigning to you but just to help me keep track of my query)
msg8421 (view) Author: thorkilnaur Date: 2009-08-23.18:29:01
Hi Eric,

I don't think I ever knew whether the set of patches of the target was a subset of 
the patches of the source (which is what I presume you would like to know). But a 
simple diff of the _darcs/patches directories of the two repositories show that 
there are 143 patches in the target that are not in the source, 1760 patches in 
the source that are not in the target, and 377 patches that are in both, but 
differ in one way or the other. I hope that this information helps, but feel free 
to ask for other checks to be carried out to decide the matter.

Best regards
Thorkil
msg8423 (view) Author: kowey Date: 2009-08-23.18:35:15
On Sun, Aug 23, 2009 at 18:29:06 +0000, Thorkil Naur wrote:
> But a simple diff of the _darcs/patches directories of the two
> repositories show that there are 143 patches in the target that are
> not in the source, 1760 patches in the source that are not in the
> target, and 377 patches that are in both, but differ in one way or the
> other. I hope that this information helps, but feel free to ask for
> other checks to be carried out to decide the matter.

Hmm.  Could you compare the inventories? 
It might be good to sort them just in case they are in different order
msg8428 (view) Author: thorkilnaur Date: 2009-08-23.19:18:46
Hi Eric,

I am not sure what you are asking for here.

Best regards
Thorkil
msg8429 (view) Author: kowey Date: 2009-08-23.19:43:13
It might be good to diff _darcs/inventory from the two repositories.
Because the patches may be in different order it may be good to perform a sort
on them to avoid spurious differences.
msg8476 (view) Author: thorkilnaur Date: 2009-08-24.18:38:21
Hi Eric,

The two _darcs/inventory files do not have much in common:

> $ head darcs_Issue_1075_target/ghc/_darcs/inventory                                              
Starting with tag:
> [TAG ghc-6.8 branched 2007-09-03
> Ian Lynagh <igloo@earth.li>**20070903155416]
> [Change the version number to 6.8 on the branch
> Ian Lynagh <igloo@earth.li>**20070903172054]
> [Fix building RTS with gcc 2.*; declare all variables at the top of a block
> Ian Lynagh <igloo@earth.li>**20070903165847>  Patch from Audrey Tang.
> ]    
> [remove debugging code
> $ head darcs_Issue_1075_source/ghc/_darcs/inventory
> Starting with tag:
> [TAG 2007-11-11
> Ian Lynagh <igloo@earth.li>**20071111161540]
> [Turn -fprint-bind-result off by default
> Ian Lynagh <igloo@earth.li>**20071111001126]
> [arrows is no longer an extralib
> Ian Lynagh <igloo@earth.li>**20071027123656] > [Remove the X11 and HGL 
libraries from extralibs
> Ian Lynagh <igloo@earth.li>**20071111213447
>  Don Stewart, X11 maintainer, requested we remove X11, and HGL depends on it
> $

I have sorted these files and their diff has almost as many lines as the files 
combined:

> $ wc -l darcs_Issue_1075_target.inventory darcs_Issue_1075_source.inventory 
>      670 darcs_Issue_1075_target.inventory
>     1057 darcs_Issue_1075_source.inventory
>     1727 total
> $ diff darcs_Issue_1075_target.inventory darcs_Issue_1075_source.inventory | wc 
-l
>     1457
> $ 

Comparing the _darcs/inventories directories:

> $ diff -r darcs_Issue_1075_target/ghc/_darcs/inventories/ 
darcs_Issue_1075_source/ghc/_darcs/inventories/
> Only in darcs_Issue_1075_target/ghc/_darcs/inventories/: 20070903155416-3fd76-
19bef68d1fa1b2cf190a06ce878867dd9b08ec5b.gz
> Only in darcs_Issue_1075_source/ghc/_darcs/inventories/: 20070925164536-3fd76-
07330cc5e4fca0c0f60bac53a10981c10b0fbb5f.gz
> Only in darcs_Issue_1075_source/ghc/_darcs/inventories/: 20071111161540-3fd76-
6f45848e09339188332b3d539b92bf95219c73d0.gz
> Only in darcs_Issue_1075_target/ghc/_darcs/inventories/: 20071111164452-3fd76-
5b6394153966f91d01962e5b157a09d4f6295d3c.gz
> Only in darcs_Issue_1075_source/ghc/_darcs/inventories/: 20080528003600-3fd76-
b01e2ea33fd5e49a0a86d3dd1a43545dfb00e1d2.gz
> Only in darcs_Issue_1075_source/ghc/_darcs/inventories/: 20080601155241-3fd76-
13b3337cedf6323aab252f8035c39326411e3614.gz
> Only in darcs_Issue_1075_target/ghc/_darcs/inventories/: 20080605235309-3fd76-
3c341a5531d47a506ae36a87e1f72b88cefa5be7.gz
> $

Concatenating and sorting all the differing inventories (including the 
_darcs/inventory) of the respective repositories and comparing shows that there 
are certainly lines in the set of inventories of the target repository that are 
not in the set of inventories of the source repository, although there is 
significant overlap:

> $ wc -l darcs_Issue_1075_target.inventories darcs_Issue_1075_source.inventories 
>     4444 darcs_Issue_1075_target.inventories
>     8656 darcs_Issue_1075_source.inventories
>    13100 total
> $ diff darcs_Issue_1075_target.inventories darcs_Issue_1075_source.inventories 
| wc -l
>     8259
> $ 

Best regards
Thorkil
msg8477 (view) Author: kowey Date: 2009-08-24.21:14:48
From your analysis, I think I agree that this is merely darcs-1 conflict misery
and that the only way forward in this case is to upgrade the repository to
darcs-2 format.  Many thanks!

Minor note: resetting to status 'unknown' was perfect, but I'd assign it to 'no
selection'; in principle you don't know who's doing the triage work.  It doesn't
really matter though especially in a tight discussion loop.
History
Date User Action Args
2008-09-14 12:19:28thorkilnaurcreate
2008-09-14 12:21:56thorkilnaursetfiles: + strobeHpc_darcs_3600_SUB_3599_20080914_130546.tar.gz
2008-09-14 12:22:28thorkilnaursetfiles: + strobeHpcDiff1.sh
2008-09-15 07:16:07koweysetstatus: unread -> unknown
messages: + msg6012
2008-09-15 08:34:00thorkilnaursetmessages: + msg6013
2009-08-10 15:31:04koweysettopic: + Performance
nosy: + dmitry.kurochkin, simon
2009-08-10 15:35:23koweysetstatus: unknown -> waiting-for
nosy: kowey, dagit, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8069
assignedto: thorkilnaur
2009-08-10 23:45:32adminsetnosy: - dagit
2009-08-23 17:10:40thorkilnaursetnosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8409
2009-08-23 17:18:37thorkilnaursetstatus: waiting-for -> unknown
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8410
assignedto: thorkilnaur -> kowey
2009-08-23 17:35:09koweysetstatus: unknown -> waiting-for
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
topic: + Conflicts
messages: + msg8412
assignedto: kowey -> thorkilnaur
2009-08-23 18:29:06thorkilnaursetstatus: waiting-for -> unknown
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8421
assignedto: thorkilnaur -> kowey
2009-08-23 18:35:18koweysetnosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8423
2009-08-23 19:18:51thorkilnaursetnosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8428
2009-08-23 19:43:18koweysetstatus: unknown -> waiting-for
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8429
assignedto: kowey -> thorkilnaur
2009-08-24 18:38:27thorkilnaursetstatus: waiting-for -> unknown
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8476
assignedto: thorkilnaur -> kowey
2009-08-24 21:14:54koweysetstatus: unknown -> wont-fix
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8477
assignedto: kowey ->
2009-08-25 18:14:26adminsetnosy: + darcs-devel, - simon
2009-08-27 14:32:25adminsetnosy: kowey, darcs-devel, thorkilnaur, dmitry.kurochkin