darcs

Issue 1273 renameFile: does not exist (No such file or directory)

Title renameFile: does not exist (No such file or directory)
Priority bug Status resolved
Milestone 2.2.x Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, mornfall, thorkilnaur, zooko
Assigned To
Topics

Created on 2008-12-19.19:39:59 by zooko, last changed 2010-06-15.21:50:57 by admin.

Messages
msg6835 (view) Author: zooko Date: 2008-12-19.19:39:55
darcs get failed on one of the tahoe-lafs buildslaves:

http://allmydata.org/buildbot/builders/hardy2.6/builds/145/steps/darcs/logs/stdio

It appears to be a sporadic failure.  The error message is:

darcs:
/home/buildslave/.darcs/cache/pristine.hashed/0000029303-d270307a184c266d2548acb104c502b9767e79b5a3eed1aa2acf32a72813da0e-new:
renameFile: does not exist (No such file or directory)

and then "darcs get" emitted no output for 1200 seconds so it was killed for
taking too long.

The error message makes me wonder if there are some problems with the ordering
or locking of access to files in the shared cache.  Perhaps that file (which
ends with "-new") was created in one step and then relinked in a second step,
and perhaps the file was gone by the time darcs tried to execute the second
step, possibly because another instance of darcs had removed it while using the
cache?
msg6836 (view) Author: zooko Date: 2008-12-19.19:45:14
I was able to reproduce this issue by running the following bash script in 3
separate windows at the same time:

while [ $? = 0 ] ; do /usr/local/bin/darcs get --verbose --partial
http://allmydata.org/source/tahoe/trunk-hashedformat ; done

One of the three immediately stopped (on its first try), with this output:

zooko@deharo1:~/dx$ while [ $? = 0 ] ; do /usr/local/bin/darcs get --verbose
--partial http://allmydata.org/source/tahoe/trunk-hashedformat ; done
Directory '/home/zooko/dx/trunk-hashedformat' already exists, creating
repository as '/home/zooko/dx/trunk-hashedformat_3'
darcs:
/home/zooko/.darcs/cache/pristine.hashed/0000039343-2a1543ca08c01f7aa296e5bad409525b0dbe29b7bba67b3825b8f66a774997d1-new:
renameFile: does not exist (No such file or directory)

Copying pristine 19/38 : allmydata.icns

I don't know if it is making progress or if it is permanently hung.  The other
two loops are continuing to run.  This is darcs-2.1.2.
msg6837 (view) Author: zooko Date: 2008-12-19.20:30:55
So it looks like that darcs process is hung.  It isn't using any CPU, but it
still says this after about almost an hour:

zooko@deharo1:~/dx$ while [ $? = 0 ] ; do /usr/local/bin/darcs get --verbose
--partial http://allmydata.org/source/tahoe/trunk-hashedformat ; done
Directory '/home/zooko/dx/trunk-hashedformat' already exists, creating
repository as '/home/zooko/dx/trunk-hashedformat_3'
darcs:
/home/zooko/.darcs/cache/pristine.hashed/0000039343-2a1543ca08c01f7aa296e5bad409525b0dbe29b7bba67b3825b8f66a774997d1-new:
renameFile: does not exist (No such file or directory)

Copying pristine 19/38 : allmydata.icns
msg6838 (view) Author: zooko Date: 2008-12-19.21:18:46
Possible work-arounds in light of the fact that darcs-2.2 is already in frozen mode:

1.  Make the cache be off by default.  Wouldn't really be a solid work-around,
and would deprive darcs-2.2 users who don't change the default of having the
caching feature.

2.  Add file locking to all accesses to the cache.
msg6854 (view) Author: zooko Date: 2008-12-21.22:05:50
This happened again on that buildslave:


https://dev.allmydata.com/buildbot-tahoe/builders/hardy2.6

I'm going to try disabling the darcs cache on that account.
msg6884 (view) Author: mornfall Date: 2008-12-25.00:00:54
Another workaround would be to use -new.$PID intead of just -new. Then, 
concurrent darcs instances wouldn't run over eachothers' temporary files in the 
cache. Although it does raise the question why temporaries are created in the 
cache at all.

I'll look.

...

Ok, I have looked. The temporaries are created there likely because it's the 
first writeable directory encountered. The patches are downloaded there and 
hardlinked and/or copied into the repository in question.

The quick fix would be, as proposed, to append our pid to the tempfile, the 
changes should be limited to src/URL.hs, where the "-new" suffix is used in a 
few places. Then, one of those renameFile's would be last and that one would 
win. It still races (and may produce several copies of the file instead of just 
one with hardlinks), but it's better than dying.

If noone picks this up soon, I'll send a patch. I consider this release 
critical for 2.2.
msg6885 (view) Author: mornfall Date: 2008-12-25.00:02:04
Where I say patches, I really mean files (this is the case for pristine 
contents just as well as for patches themselves). Just to clear up any possible 
confusion...
msg6888 (view) Author: zooko Date: 2008-12-25.00:16:21
Thanks, Petr!  I'm glad you are prioritizing this for 2.2!

Regards,

Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
backup all your files for $10/month -- http://allmydata.com/? 
tracking=zsig
msg6892 (view) Author: mornfall Date: 2008-12-26.11:26:56
The following patch updated the status of issue1273 to be resolved:

* resolve issue1273: Work around a race in download code. 

When we had multiple darcsen downloading the same repo, with cache enabled,
they would stomp on each other's temporary files and die horribly.
msg6939 (view) Author: zooko Date: 2008-12-30.08:12:18
I got what looked like this problem again, using darcs-2.1.2:

http://allmydata.org/buildbot/builders/hardy2.6/builds/159/steps/darcs/logs/stdi
o

I ran the script again of having 3 parallel while loops doing a darcs get --
lazy, but it did not reproduce the issue.  I went ahead and built 2.1.2 + the 
patch that claims that it fixes issue1273, and ran the 3 parallel while loops on 
that, and it too worked, so I installed 2.1.2+issue1273 on that allmydata.org 
server.
History
Date User Action Args
2008-12-19 19:39:59zookocreate
2008-12-19 19:45:17zookosetstatus: unread -> unknown
nosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin
messages: + msg6836
2008-12-19 20:30:59zookosetnosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin
messages: + msg6837
2008-12-19 21:18:48zookosetnosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin
messages: + msg6838
2008-12-21 22:05:54zookosetnosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin
messages: + msg6854
2008-12-25 00:00:57mornfallsetnosy: + mornfall
messages: + msg6884
2008-12-25 00:02:07mornfallsetnosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall
messages: + msg6885
2008-12-25 00:16:25zookosetnosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall
messages: + msg6888
2008-12-26 11:27:01mornfallsetstatus: unknown -> resolved-in-unstable
nosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall
messages: + msg6892
2008-12-29 17:40:24mornfallsettopic: + Target-2.2
nosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall
2008-12-30 08:12:22zookosetnosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall
messages: + msg6939
2009-04-22 03:39:29twbsetstatus: resolved-in-unstable -> resolved
nosy: kowey, zooko, simon, thorkilnaur, dmitry.kurochkin, mornfall
2009-08-25 17:36:26adminsetnosy: + darcs-devel, - simon
2009-08-27 14:19:47adminsetnosy: kowey, darcs-devel, zooko, thorkilnaur, dmitry.kurochkin, mornfall
2010-06-15 21:50:57adminsetmilestone: 2.2.x
2010-06-15 21:50:57adminsettopic: - Target-2.2