Created on 2009-06-03.22:39:55 by zooko, last changed 2017-07-30.23:42:02 by gh.
msg7873 (view) |
Author: zooko |
Date: 2009-06-03.22:39:45 |
|
There follows a message that I wrote to the darcs-users mailing
list. Eric Kow asked me to open a ticket for it, so here it is. He
also asked what parameters cause this to be slow. I think, but am
not sure, that the main issue is a deep history. Projects which have
many thousands of patches seem more likely to be slow.
Regards,
Zooko
Begin forwarded message:
> From: Zooko Wilcox-O'Hearn <zooko@zooko.com>
> Date: May 27, 2009 10:26:07 AM MDT
> To: Eric Kow <kowey@darcs.net>
> Cc: Petr Rockai <me@mornfall.net>, Darcs Mailing list <darcs-
> users@darcs.net>
> Subject: Re: [darcs-users] Hashed-storage & darcs 2.3 (Was; Re: 2.3
> release schedule)
>
> I have a request: speed up the operations which users have to wait
> on when they browse a darcs repository through trac, and deploy it
> as soon as possible!
>
> For example, the front page of http://darcs.net has a link to this
> page:
>
> http://allmydata.org/trac/darcs-2/browser
>
> If you load that page, trac issues the darcs commands like these:
>
> 2009-05-27 09:15:14,095 Trac[command] DEBUG: cd /home/source/darcs/
> darcs-2/trunk-hashedformat; TZ=UTC darcs --version
> 2009-05-27 09:15:14,244 Trac[command] DEBUG: cd /home/source/darcs/
> darcs-2/trunk-hashedformat; TZ=UTC darcs query contents --quiet --
> match "hash 20090513075931-
> c31ef-94cc8b3d6a88ae270963d60d6d5426bd0bda7245.gz" "GNUmakefile"
> 2009-05-27 09:15:14,638 Trac[command] DEBUG: cd /home/source/darcs/
> darcs-2/trunk-hashedformat; TZ=UTC darcs query contents --quiet --
> match "hash 20090513075931-
> c31ef-94cc8b3d6a88ae270963d60d6d5426bd0bda7245.gz" "configure.ac"
>
> If there are enough files which have been changed since the last
> time trac queried darcs and cached the result, then this can take
> many seconds. While it is processing, other users of the same trac
> will get a "database is locked" error if they try to view the same
> repository. On some repositories that I use, it takes more than 30
> seconds and trac times-out (believing that the database or revision
> control tool must have locked up) and gives an error message.
>
> Also if you view the contents of a file like this:
>
> http://allmydata.org/trac/darcs-2/browser/src/darcs.tex
>
> Or the history of a file or directory, like this:
>
> http://allmydata.org/trac/darcs-2/log/src/darcs.tex
>
> Then trac will issue a 'darcs query contents --match "hash ..."'
> command.
>
> So, it would be great for me if 'darcs query contents --match
> "hash ..."' were as fast as possible, and if this speed-up were
> released in a stable and completely bug-free new version of darcs
> as soon as possible. ;-)
>
> Thanks!
>
> Regards,
>
> Zooko
> _______________________________________________
> darcs-users mailing list
> darcs-users@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-users
|
msg7881 (view) |
Author: kowey |
Date: 2009-06-04.06:17:51 |
|
Marking 'Performance' and updating to reflect the new command name (we've got to
have a common vocabulary here)
|
msg7888 (view) |
Author: zooko |
Date: 2009-06-04.15:52:58 |
|
I just set up a new trac showing this repository:
http://allmydata.org/source/cryptopp/trunk . Here is the trac:
http://allmydata.org/trac/cryptopp . If you click on "Browse Source" and then
click on the directory named "c5" then it will time out and give an error.
That's because there are 290 files in "c5", and for each file trac issues a
command like
darcs query contents --quiet --match "hash
20021004173141-0a92f-14ec3ac46e0a267c7eaaa7342de0f999da20f406.gz" "tailor.config"
That one command takes 44 seconds on my server. That's about 100 times as long
as would be usable. To reproduce this, run "darcs get
http://allmydata.org/source/cryptopp/trunk" and then cd into it and run:
darcs query contents --quiet --match "hash
20021004173141-0a92f-14ec3ac46e0a267c7eaaa7342de0f999da20f406.gz" "tailor.config"
Please let me know how long that command takes on your machine and with your
version of darcs.
|
msg7889 (view) |
Author: zooko |
Date: 2009-06-04.15:59:58 |
|
Hm, no a speedup by a factor of 100 would not make this usable if trac does the
same command for each of the 290 files and it takes as long for each one. Let's
say for this to be usable you need to get a listing of the directory which
contains 290 files in, let's say, less than 10 seconds. So darcs has to provide
whatever information it is that trac needs in about 300 milliseconds per file.
Maybe trac or Lele's trac-darcs plugin could generate the appropriate web page
listing the directory *without* needing to query darcs about each file?
|
msg7890 (view) |
Author: kowey |
Date: 2009-06-04.16:05:03 |
|
Adding Gwern because he may be interested for filestore, and Benedikt as well
because of his filecache work (hope you don't mind)
It'd be good to see if the issue here is match hash (and not the show contents
command in itself).
Also, another thing to consider is if it's the same thing that annotate and
changes are affected by, just having to comb through all patches, when we could
just be grabbing the patches that affect a specific file instead. This is where
Benedikt's work comes in.
|
msg7891 (view) |
Author: kowey |
Date: 2009-06-04.16:05:46 |
|
As for the multiple file use case, darcs diff may be useful. It also accepts
matchers.
|
msg7900 (view) |
Author: lele |
Date: 2009-06-04.22:32:24 |
|
On Thu, 04 Jun 2009 16:00:02 -0000
Zooko <bugs@darcs.net> wrote:
> Maybe trac or Lele's trac-darcs plugin could generate the appropriate
> web page listing the directory *without* needing to query darcs about
> each file?
It may be worth remembering that trac-darcs heavily caches files's
content, and it should be calling darcs just once per file per
revision.
Maybe a simple optimization could be handling "darcs query contents
fileA fileB fileC" at least with --xml-output: I bet the time it takes
darcs to compute all the files is a fraction of the N calls to compute
them one at a time.
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net | -- Fortunato Depero, 1929.
|
msg8317 (view) |
Author: kowey |
Date: 2009-08-21.17:01:14 |
|
On Zooko's cryptopp repository.
Here we search for (a) darcs changes --match 'hash' with something that's far
back in the history (b) darcs show contents with the same as a, (c) darcs show
contents matching on the most recent patch (d) darcs show contents on HEAD
without actually using any matchers.
Petr says that he was able to improve matcher efficiency in his branch of darcs,
which sounds very promising, but I'm not sure what the implications of these
improvements are exactly.
As we can see in (a) just matching and (c) just loading the repository into
memory are not particularly costly on their own [although there is *some*
overhead, as we see in (d)]. What seems to be hurting us most is when we have
to go unapply a bunch of patches to our in-memory representation of the tree.
But does that mean this bit in particular is slow, or could it be something
ridiculous like us holding on patches way after we're done looking at them?
I'd like to see timings with old darcs and his darcs, which is why I'm assigning
this to Petr.
Afterwards, I suggest:
- adding this sort of test to the benchmarks
- more profiling (and general thinking about things)
I'm also throwing in issue124 (for the filecache work). As a good way on
cutting down on variables however, it may be a good idea to work with some demo
repositories where there is only ever one file in the repo, just a whole lot of
patches that affect that one file. This might give us a better understanding of
what's going on.
$ time darcs changes --quiet --match "hash
20021004173141-0a92f-14ec3ac46e0a267c7eaaa7342de0f999da20f406.gz"
"tailor.config" > /dev/null
real 0m0.117s
user 0m0.020s
sys 0m0.008s
$ time darcs show contents --quiet --match "hash
20021004173141-0a92f-14ec3ac46e0a267c7eaaa7342de0f999da20f406.gz"
"tailor.config" > /dev/null
real 0m30.343s
user 0m8.417s
sys 0m12.757s
$ time darcs show contents --quiet --match "hash
20090711225018-e3536-21ecdecdcefba3c64f3407fd049acc83558ecd97.gz"
"tailor.config" > /dev/null
real 0m3.401s
user 0m0.288s
sys 0m0.136s
$ time darcs show contents --quiet "tailor.config" > /dev/null
real 0m0.043s
user 0m0.004s
sys 0m0.012s
|
msg8997 (view) |
Author: kowey |
Date: 2009-10-22.09:39:06 |
|
I think I mis-filed this. Marking need-action because we need somebody to
generate timings for this using darcs-hs (comparing to current darcs)
|
msg9192 (view) |
Author: kowey |
Date: 2009-11-03.17:35:46 |
|
I have some updates to this. The main user affected is trac-darcs (and
presumably other web interfaces or guis around darcs repositories). I'll speak
from the point of view of trac-darcs
source browser [workaround found, trac-darcs 0.7]
-------------------------------------------------
On 2009-07, Zooko pointed reminded us about this problem for trac-darcs. Petr
then pointed out that trac-darcs could actually avoid using matchers for getting
the HEAD version of the file. This according to Zooko's report made trac-darcs
much more usable.
See http://lists.osuosl.org/pipermail/darcs-users/2009-July/020387.html and also
a recent chat about this http://irclog.perlgeek.de/darcs/2009-11-03#i_1681206
So fixing this would only really matter for the case where users specifically
decide to run the source browser on an older version of a file or directory.
changeset viewing [still a problem]
-----------------------------------
Zooko pointed out another case where this affects us,
http://lists.osuosl.org/pipermail/darcs-users/2009-October/021906.html
Petr has suggested some possible ideas on how to make this better in practice
(see the thread for details)
One workaround I've suggested to Lele was to use darcs diff, as you'd only have
to run it once instead of repeating the query for each individual file
this ticket
-----------
Now that darcs-hs has been merged into the mainline, it would still be useful to
have some timings against 2.2.0 of this darcs show contents --match 'hash X'
operation.
|
msg9962 (view) |
Author: zooko |
Date: 2010-02-07.16:25:28 |
|
> So fixing this would only really matter for the case where users
specifically decide to run the source browser on an older version of a
file or directory.
Just to be clear this is a common case. If you want to browse the most
recent version of the source you can just use your text editor on your
local repo, but if you want to browse an older version you need more
tool support. We often browse older versions in the Tahoe-LAFS
development process.
Another problem is that trac locks the entire database while invoking
trac-darcs. So if any user clicks on "annotate" then all users get a
timeout from the entire web site http://allmydata.org/trac/tahoe-lafs
until the "darcs annotate" completes.
Now that's clearly something that could be improved by trac itself -- if
it knew which queries to the revision control plugin were side-effect
free then it could leave the database unlocked when issuing those
commands. Then other users would not be blocked and as long as the
revision control tool was concurrency-safe for multiple simultaneous
read queries and at most one write command nothing bad would happen.
However, it would be really nice if darcs were so fast that it was
usuable for this use case even when trac locks the database for all queries.
There was some more discussion on the mailing list:
http://lists.osuosl.org/pipermail/darcs-users/2010-February/023034.html
|
|
Date |
User |
Action |
Args |
2009-06-03 22:39:55 | zooko | create | |
2009-06-04 06:17:56 | kowey | set | status: unread -> unknown topic:
+ Performance title: please speed up "darcs query contents" -> please speed up "darcs show contents" nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg7881 priority: feature |
2009-06-04 15:53:01 | zooko | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg7888 |
2009-06-04 16:00:02 | zooko | set | nosy:
+ lele messages:
+ msg7889 |
2009-06-04 16:05:06 | kowey | set | nosy:
kowey, zooko, lele, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg7890 title: please speed up "darcs show contents" -> speed up: darcs show contents --match "hash foo" |
2009-06-04 16:05:49 | kowey | set | nosy:
kowey, zooko, lele, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg7891 |
2009-06-04 16:07:05 | kowey | set | nosy:
+ gwern, beschmi |
2009-06-04 22:32:29 | lele | set | nosy:
beschmi, kowey, zooko, lele, simon, thorkilnaur, gwern, dmitry.kurochkin messages:
+ msg7900 title: speed up: darcs show contents --match "hash foo" -> please speed up "darcs show contents" |
2009-08-06 21:14:06 | admin | set | nosy:
- beschmi |
2009-08-21 17:01:21 | kowey | set | status: unknown -> waiting-for topic:
+ Matchers messages:
+ msg8317 assignedto: mornfall nosy:
+ mornfall title: please speed up "darcs show contents" -> darcs show contents --match 'hash X' is slow superseder:
+ wish: performance of darcs changes <file> priority: feature -> bug |
2009-08-25 17:45:35 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-27 14:28:46 | admin | set | nosy:
kowey, darcs-devel, zooko, lele, thorkilnaur, gwern, dmitry.kurochkin, mornfall |
2009-10-22 09:36:00 | kowey | set | status: waiting-for -> unknown nosy:
kowey, darcs-devel, zooko, lele, thorkilnaur, gwern, dmitry.kurochkin, mornfall |
2009-10-22 09:39:09 | kowey | set | status: unknown -> needs-reproduction nosy:
kowey, darcs-devel, zooko, lele, thorkilnaur, gwern, dmitry.kurochkin, mornfall messages:
+ msg8997 |
2009-11-03 17:35:49 | kowey | set | messages:
+ msg9192 |
2010-02-07 16:25:32 | zooko | set | messages:
+ msg9962 |
2017-07-30 23:42:02 | gh | set | status: needs-reproduction -> given-up |
|