darcs

Issue 2280 failure on NFS

Title failure on NFS
Priority bug Status needs-reproduction
Milestone Resolved in
Superseder Nosy List fx
Assigned To
Topics

Created on 2012-12-17.11:07:08 by fx, last changed 2020-08-01.09:07:11 by bf.

Messages
msg16420 (view) Author: fx Date: 2012-12-17.11:07:07
I'm seeing problems with a repo on an NFSv4 mount (or an NFSv3 mount of
the same export).  The server is Solaris 10 and the client RHEL5, and
darcs 2.8.3, built with 6.12.3, though it also happened with a version
2.5 (I think).

Doing an obliterate, I either see

  darcs: _darcs/tentative_hashed_inventory-0: rename: resource busy (Device or resource busy)
or

  darcs: _darcs/index: removeLink: resource busy (Device or resource busy)

Copying the repo to local disk works fine.  I can't easily try other
client/server combination currently, in case it's something specific to
that.
msg16421 (view) Author: markstos Date: 2012-12-17.15:02:54
What would you expect darcs to do in this case? 

If the file-system is reporting that it timed-out on an operation, it seems 
reason to bubble that up to the user.

Perhaps you'd like darcs to have an option over a higher timeout value for 
filesystem operations of the filesystem appears "busy-but-functional"?
msg16422 (view) Author: fx Date: 2012-12-17.17:27:19
Mark Stosberg <bugs@darcs.net> writes:

> Mark Stosberg <mark@summersault.com> added the comment:
>
> What would you expect darcs to do in this case? 
>
> If the file-system is reporting that it timed-out on an operation, it seems 
> reason to bubble that up to the user.

I see no evidence for filesystem timeouts.  Is that a known problem that
produces those symptoms?

I was hoping/expecting it was known, despite the lack of bug reports I
could find I was guessing it would be related to locking, if anything
but, as I understand it, that should work on NFS4.
msg16423 (view) Author: markstos Date: 2012-12-17.18:44:49
From what I can tell, "Device or resource busy" comes not from Darcs or 
GHC-land, but from Linux/NFS. See all the results for here:

https://encrypted.google.com/search?q=nfs++%22Device+or+resource+busy%22

I'm tentatively marking this as "not our bug". If the issue can be 
reproduced in a way that is clearly darcs fault, please re-open it.
msg16444 (view) Author: fx Date: 2012-12-20.16:35:43
Mark Stosberg <bugs@darcs.net> writes:

> Mark Stosberg <mark@summersault.com> added the comment:
>
>>From what I can tell, "Device or resource busy" comes not from Darcs or 
> GHC-land, but from Linux/NFS.

I realize it's NFS-related.  Perhaps I should have made clear originally
that I expected it to be known whether it works on NFS generally, though
I couldn't find anything to say so.

> See all the results for here:
>
> https://encrypted.google.com/search?q=nfs++%22Device+or+resource+busy%22
>
> I'm tentatively marking this as "not our bug". If the issue can be 
> reproduced in a way that is clearly darcs fault, please re-open it.

Are you saying that darcs simply isn't supported on NFS (or other --
AFS, CIFS, FUSE?) filesystems?  If so, it needs documenting.

Looking more closely, and assuming the "rename" message is actually from
rename(2), POSIX says:

  [EBUSY]
      [CX] [Option Start] The directory named by old or new is currently
      in use by the system or another process, and the implementation
      considers this an error. [Option End]

and the Debian says:

       EBUSY  The  rename fails because oldpath or newpath is a directory that
              is in use by some process (perhaps as current working directory,
              or  as root directory, or because it was open for reading) or is
              in use by the system (for example as  mount  point),  while  the
              system considers this an error.  (Note that there is no require‐
              ment to return EBUSY in such cases — there is nothing wrong with
              doing  the  rename anyway — but it is allowed to return EBUSY if
              the system cannot otherwise handle such situations.)

which suggests it is darcs' "fault", but if it's not going to be fixed,
it would be useful for people to know they need to use a local
filesystem.  (It seems as if that's wise for performance reasons, but
that's a different issue.)
msg16445 (view) Author: fx Date: 2012-12-20.17:47:35
It turns out there's a test case for failure on NFS (nfs-failure.sh),
but I see it failing differently now:

  $ darcs init
  $ echo first > a
  $ darcs add a
  $ darcs record --pipe --all --name=first <<EOF
  > Thu Sep 18 22:37:06 MSD 2008
  > author
  > EOF
  darcs: _darcs/index: removeLink: resource busy (Device or resource busy)
msg22344 (view) Author: bf Date: 2020-08-01.09:07:09
I have been using darcs over NFS(-3) and never observed this failure.
History
Date User Action Args
2012-12-17 11:07:08fxcreate
2012-12-17 15:02:56markstossetstatus: unknown -> waiting-for
messages: + msg16421
2012-12-17 17:27:20fxsetmessages: + msg16422
2012-12-17 18:44:50markstossetpriority: not-our-bug
messages: + msg16423
2012-12-20 16:35:45fxsetmessages: + msg16444
2012-12-20 17:47:36fxsetmessages: + msg16445
2020-08-01 09:07:11bfsetpriority: not-our-bug -> bug
status: waiting-for -> needs-reproduction
messages: + msg22344