Created on 2008-12-19.19:39:59 by zooko, last changed 2010-06-15.21:50:57 by admin.
msg6835 (view) |
Author: zooko |
Date: 2008-12-19.19:39:55 |
|
darcs get failed on one of the tahoe-lafs buildslaves:
http://allmydata.org/buildbot/builders/hardy2.6/builds/145/steps/darcs/logs/stdio
It appears to be a sporadic failure. The error message is:
darcs:
/home/buildslave/.darcs/cache/pristine.hashed/0000029303-d270307a184c266d2548acb104c502b9767e79b5a3eed1aa2acf32a72813da0e-new:
renameFile: does not exist (No such file or directory)
and then "darcs get" emitted no output for 1200 seconds so it was killed for
taking too long.
The error message makes me wonder if there are some problems with the ordering
or locking of access to files in the shared cache. Perhaps that file (which
ends with "-new") was created in one step and then relinked in a second step,
and perhaps the file was gone by the time darcs tried to execute the second
step, possibly because another instance of darcs had removed it while using the
cache?
|
msg6836 (view) |
Author: zooko |
Date: 2008-12-19.19:45:14 |
|
I was able to reproduce this issue by running the following bash script in 3
separate windows at the same time:
while [ $? = 0 ] ; do /usr/local/bin/darcs get --verbose --partial
http://allmydata.org/source/tahoe/trunk-hashedformat ; done
One of the three immediately stopped (on its first try), with this output:
zooko@deharo1:~/dx$ while [ $? = 0 ] ; do /usr/local/bin/darcs get --verbose
--partial http://allmydata.org/source/tahoe/trunk-hashedformat ; done
Directory '/home/zooko/dx/trunk-hashedformat' already exists, creating
repository as '/home/zooko/dx/trunk-hashedformat_3'
darcs:
/home/zooko/.darcs/cache/pristine.hashed/0000039343-2a1543ca08c01f7aa296e5bad409525b0dbe29b7bba67b3825b8f66a774997d1-new:
renameFile: does not exist (No such file or directory)
Copying pristine 19/38 : allmydata.icns
I don't know if it is making progress or if it is permanently hung. The other
two loops are continuing to run. This is darcs-2.1.2.
|
msg6837 (view) |
Author: zooko |
Date: 2008-12-19.20:30:55 |
|
So it looks like that darcs process is hung. It isn't using any CPU, but it
still says this after about almost an hour:
zooko@deharo1:~/dx$ while [ $? = 0 ] ; do /usr/local/bin/darcs get --verbose
--partial http://allmydata.org/source/tahoe/trunk-hashedformat ; done
Directory '/home/zooko/dx/trunk-hashedformat' already exists, creating
repository as '/home/zooko/dx/trunk-hashedformat_3'
darcs:
/home/zooko/.darcs/cache/pristine.hashed/0000039343-2a1543ca08c01f7aa296e5bad409525b0dbe29b7bba67b3825b8f66a774997d1-new:
renameFile: does not exist (No such file or directory)
Copying pristine 19/38 : allmydata.icns
|
msg6838 (view) |
Author: zooko |
Date: 2008-12-19.21:18:46 |
|
Possible work-arounds in light of the fact that darcs-2.2 is already in frozen mode:
1. Make the cache be off by default. Wouldn't really be a solid work-around,
and would deprive darcs-2.2 users who don't change the default of having the
caching feature.
2. Add file locking to all accesses to the cache.
|
msg6854 (view) |
Author: zooko |
Date: 2008-12-21.22:05:50 |
|
This happened again on that buildslave:
https://dev.allmydata.com/buildbot-tahoe/builders/hardy2.6
I'm going to try disabling the darcs cache on that account.
|
msg6884 (view) |
Author: mornfall |
Date: 2008-12-25.00:00:54 |
|
Another workaround would be to use -new.$PID intead of just -new. Then,
concurrent darcs instances wouldn't run over eachothers' temporary files in the
cache. Although it does raise the question why temporaries are created in the
cache at all.
I'll look.
...
Ok, I have looked. The temporaries are created there likely because it's the
first writeable directory encountered. The patches are downloaded there and
hardlinked and/or copied into the repository in question.
The quick fix would be, as proposed, to append our pid to the tempfile, the
changes should be limited to src/URL.hs, where the "-new" suffix is used in a
few places. Then, one of those renameFile's would be last and that one would
win. It still races (and may produce several copies of the file instead of just
one with hardlinks), but it's better than dying.
If noone picks this up soon, I'll send a patch. I consider this release
critical for 2.2.
|
msg6885 (view) |
Author: mornfall |
Date: 2008-12-25.00:02:04 |
|
Where I say patches, I really mean files (this is the case for pristine
contents just as well as for patches themselves). Just to clear up any possible
confusion...
|
msg6888 (view) |
Author: zooko |
Date: 2008-12-25.00:16:21 |
|
Thanks, Petr! I'm glad you are prioritizing this for 2.2!
Regards,
Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
backup all your files for $10/month -- http://allmydata.com/?
tracking=zsig
|
msg6892 (view) |
Author: mornfall |
Date: 2008-12-26.11:26:56 |
|
The following patch updated the status of issue1273 to be resolved:
* resolve issue1273: Work around a race in download code.
When we had multiple darcsen downloading the same repo, with cache enabled,
they would stomp on each other's temporary files and die horribly.
|
msg6939 (view) |
Author: zooko |
Date: 2008-12-30.08:12:18 |
|
I got what looked like this problem again, using darcs-2.1.2:
http://allmydata.org/buildbot/builders/hardy2.6/builds/159/steps/darcs/logs/stdi
o
I ran the script again of having 3 parallel while loops doing a darcs get --
lazy, but it did not reproduce the issue. I went ahead and built 2.1.2 + the
patch that claims that it fixes issue1273, and ran the 3 parallel while loops on
that, and it too worked, so I installed 2.1.2+issue1273 on that allmydata.org
server.
|
|
Date |
User |
Action |
Args |
2008-12-19 19:39:59 | zooko | create | |
2008-12-19 19:45:17 | zooko | set | status: unread -> unknown nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg6836 |
2008-12-19 20:30:59 | zooko | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg6837 |
2008-12-19 21:18:48 | zooko | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg6838 |
2008-12-21 22:05:54 | zooko | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin messages:
+ msg6854 |
2008-12-25 00:00:57 | mornfall | set | nosy:
+ mornfall messages:
+ msg6884 |
2008-12-25 00:02:07 | mornfall | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall messages:
+ msg6885 |
2008-12-25 00:16:25 | zooko | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall messages:
+ msg6888 |
2008-12-26 11:27:01 | mornfall | set | status: unknown -> resolved-in-unstable nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall messages:
+ msg6892 |
2008-12-29 17:40:24 | mornfall | set | topic:
+ Target-2.2 nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall |
2008-12-30 08:12:22 | zooko | set | nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall messages:
+ msg6939 |
2009-04-22 03:39:29 | twb | set | status: resolved-in-unstable -> resolved nosy:
kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall |
2009-08-25 17:36:26 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-27 14:19:47 | admin | set | nosy:
kowey, darcs-devel, zooko, thorkilnaur, dmitry.kurochkin, mornfall |
2010-06-15 21:50:57 | admin | set | milestone: 2.2.x |
2010-06-15 21:50:57 | admin | set | topic:
- Target-2.2 |
|