darcs

Issue 1753 darcs: opening of '_darcs/index' failed: does not exist (No such file or directory)

Title darcs: opening of '_darcs/index' failed: does not exist (No such file or directory)
Priority critical Status resolved
Milestone 2.4.x Resolved in 2.4.x
Superseder Nosy List darcs-devel, dequuvae, dmitry.kurochkin, duncan, duncan.coutts, kowey, mornfall, tux_rocker, twb, wferi
Assigned To tux_rocker
Topics Hashed, Regression

Created on 2010-03-01.11:09:35 by twb, last changed 2010-06-16.14:40:39 by kowey.

Files
File name Uploaded Type Edit Remove
darcs-2.4.typescript twb, 2010-03-01.11:08:44 text/plain
Messages
msg10088 (view) Author: twb Date: 2010-03-01.11:09:32
During ./Setup test, I get

    darcs: opening of '_darcs/index' failed: does not exist (No such file or directory)

for lots of tests.  Full transcript attached.  I don't have time to
dig into this deeply today, sorry.  Note that I've applied a patch
from mornfall to allow hashed-storage to build against mmap 0.5.x.
Attachments
msg10091 (view) Author: kowey Date: 2010-03-01.11:19:16
It sounds like it's important to prioritise this one, as we've had 3
independent reports (including Trent) of something like this happening
with folks trying out darcs 2.4

So two potential variables to explore here:

 - GHC 6.12.1
 - the 64 bit machine

Any other ideas?

I think we need somebody to get in touch with mrothe and derrida from
#darcs and go through some interview/debugging with them.  Petr, may I
assign this to you, as it's index related?
msg10092 (view) Author: kowey Date: 2010-03-01.11:21:16
(Oh an obvious third variable which I failed to notice would be the mmap
0.5.x patch, probably the first one to elminate)
msg10093 (view) Author: mornfall Date: 2010-03-01.11:58:09
A (32b) build with mmap 0.5.3 patch, darcs HEAD, hashed-storage HEAD 
works for me. I will test released versions later.
msg10096 (view) Author: kowey Date: 2010-03-01.13:30:12
Wagner (hereby added) has had the dubious pleasure of being victim #4 to
this issue.  He's provided us with some new hints: he has GHC <= 6.10.x,
and a 32 bit machine, but also the new mmap 0.5.x version of
hashed-storage.  Moreover, things are working for him when he downgrades
back to the released hashed-storage.

He's pointed out that even though this does not explain the problem for
victims #1 and #2, at least trying to figure out what went wrong in this
easy-to-reproduce scenario (ie. with mmap-0.5.x) may provide hints for
the more general problem...
msg10102 (view) Author: kowey Date: 2010-03-01.16:43:41
I'm not entirely sure if this explains it, but Markus (mrothe) reports
that he has mmap-0.5.x and importantly, a vanilla hashed-storage using
mmap-0.4.x.

I noticed that the Darcs mmap dependency was rather loose (>= 0.2). 
Markus reports that tightening this up fixes the problem.

So I've sent patch169, but I still don't know *what* exactly happened... :-/

Looks like if this is really it, we need to release darcs-2.4.1 very soon.
msg10104 (view) Author: mornfall Date: 2010-03-01.17:58:51
With unpatched hashed-storage, you can't build darcs against mmap 0.5 
(since it is not supported to link two versions of a single package into 
a single binary). So if the problem really is due to mmap version, it is 
only happening with unreleased versions, as far as I can tell. Or maybe 
with buggy ghc/cabal/whatever.
msg10111 (view) Author: wferi Date: 2010-03-03.16:58:02
Mostly guessing: then maybe it's an issue of Darcs with mmap-0.5, since
you (Petr) audited the hashed-storage patch, but nobody cared about
checking the compatibility of Darcs itself with mmap-0.5.
msg10116 (view) Author: twb Date: 2010-03-04.01:36:01
Ferenc Wágner wrote:
> 
> Ferenc Wágner <wferi@niif.hu> added the comment:
> 
> Mostly guessing: then maybe it's an issue of Darcs with mmap-0.5, since
> you (Petr) audited the hashed-storage patch, but nobody cared about
> checking the compatibility of Darcs itself with mmap-0.5.

What happens if you build Darcs 2.4 against an mmap-0.5-based
hashed-storage, but pass -f-mmap to Darcs?
msg10118 (view) Author: wferi Date: 2010-03-04.17:08:38
"Trent W. Buck" <bugs@darcs.net> writes:

> What happens if you build Darcs 2.4 against an mmap-0.5-based
> hashed-storage, but pass -f-mmap to Darcs?

Hell breaks loose:

stat64("_darcs/index_invalid", 0xb6f47330) = -1 ENOENT (No such file or directory)
stat64("_darcs/index", {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
open("_darcs/index", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 6
fstat64(6, {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
mmap2(NULL, 4, PROT_READ, MAP_PRIVATE, 6, 0) = 0xb7791000
close(6)                    = 0
stat64("_darcs/index", {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
stat64("_darcs/index", {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
open("_darcs/index", O_RDWR|O_NOCTTY|O_LARGEFILE) = 6
fstat64(6, {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
close(6)                    = 0
write(2, "darcs: "..., 7)   = 7
write(2, "mmap of '_darcs/index' failed, of"..., 109) = 109
write(2, "\n"..., 1)        = 1
darcs: mmap of '_darcs/index' failed, offset and size beyond end of file: does not exist (No such file or directory)

As far as I can see, we're in Darcs/Repository/State.hs:readIndex
starting with two doesFileExist (stat64) calls, then I.indexFormatValid
(open, fstat64, mmap2, close) returns True, thus finally I.readIndex
calling mmapIndex, doing another doesFileExist (stat64) and a
getFileStatus (stat64), then mmapFileForeignPtr calls mmapFilePtr, which
calls mmapFileOpen (open) then sanitizeFileRegion, which calls
c_system_io_file_size (fstat64) but doesn't like the size and offset
values.  The throwErrno function isn't appropriate here, as the error
has nothing to do with errno and the corresponding error string.

readIndex says: mmapIndex indexpath 0, so size becomes act_size there,
and mmapFileForeignPtr gets called with range (0,act_size+size_magic),
ie. a range size_magic (4) bytes longer than the file itself.  Thus
longsize<(offset + fromIntegral size) is true, and sanitizeFileRegion
throws the error.

I'd say we found a bug in hashed-storage's mmapIndex, probably exhibited
by the new mmap interface.  And another in mmap (throwErrno usage).
-- 
Regards,
Feri.
msg10121 (view) Author: wferi Date: 2010-03-05.17:59:07
Something like the change below would seem logical, and it even works to
some extent (but still only a shot in the dark):

hunk ./Storage/Hashed/Index.hs 212
   act_size <- if exist then fileSize `fmap` getFileStatus indexpath
                        else return 0
   let size :: Int
-      size = fromIntegral $
-                 if req_size > 0 then fromIntegral req_size else act_size
+      size = if req_size > 0 then req_size else fromIntegral act_size - size_magic
   case size of
     0 -> return (castForeignPtr nullForeignPtr, size)
     _ -> do (x, _, _) <- mmapFileForeignPtr indexpath
hunk ./Storage/Hashed/Index.hs 216
-                                         ReadWrite (Just (0, size + size_magic))
+                                         ReadWrite (Just (fromIntegral size_magic, size))
             return (x, size)
 
 data IndexM m = Index { mmap :: (ForeignPtr ())

Hmm, I see I lowered the priority of this bug. Sorry, it really wasn't
my intention, so I change it back to critical now...
Eek, maybe an email comment won't word-wrap the patch.
msg10122 (view) Author: mornfall Date: 2010-03-06.09:38:44
(1) Darcs with -f-mmap works just fine.
(2) I have never blessed the mmap-0.5 patch for hashed-storage. It's not 
even in HEAD, not to say anything about released versions of h-s. 
(Although I have used it locally without issues for a while.)
(3) If cabal happily builds darcs with two different mmap versions 
linked in, that's a cabal bug. By default, that does not happen. I have 
to specify --constraint 'mmap > 0.5' to darcs's configure to get a 
broken binary.

As for the size/offset bug in h-s, good catch -- you are right that the 
size_magic is added redundantly. Nevertheless, from the point of view of 
darcs *working*, this is a harmless bug, just making the index longer 
than strictly necessary.

However, I see that if you are using h-s with mmap-0.5, you have a bogus 
patch for that. Please see http://pastebin.ca/1806460 -- that's the 
patch I sent to Trent and that actually works. From the patch you 
propose, I see you don't have this one, therefore the breakage...
msg10123 (view) Author: kowey Date: 2010-03-06.11:15:18
We need to focus on getting this resolved from a user point of view.

Let's narrow down to what happens when you have a vanilla hashed-storage
from Hackage (ie. built against mmap-0.4). We do have users (mrothe and
presumably derrida) who exactly fit this description; and we have no
reason to believe that they are purposely jumping through any hoops to
build darcs against mmap-0.5.

So I think we have (a) assume there is some sort of Cabal bug and (b)
take concrete action (patch169) to cope with this [because it ultimately
does not matter where the bug lies if our users are tripping over it].

What's odd is that we're not getting more reports about this.  Either
users are being really really passive (or haven't upgraded yet), or
there are some sort pre-conditions that have to be fulfilled for this to
trigger, eg. you already have mmap-0.5 on your machine before you cabal
install darcs, or we're just missing something...
msg10126 (view) Author: mornfall Date: 2010-03-06.19:01:40
Eric, I don't disagree. I, however, can do little about it -- you 
probably need to talk to Reinier. I have done as much as I could 
diagnosing the problem. There are several options on dealing with it, 
but we need to pick one and proceed...

(1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs 2.4.1 
requiring this h-s (and with same mmap dependency as h-s, i.e. >= 0.5 && 
< 0.6).
(2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5 dependency.

I think Trent & other distribution folks would be happier about (1), 
even though (2) is a little safer and easier.

Ultimately however, this is something that needs coordination between h-
s and darcs, and I can only speak for h-s at this point. So if we settle 
on a solution (maybe even a different one than 1/2 above, I don't 
particularly care as long as it works), let me know and I can upload my 
part of the deal. A darcs release can follow shortly.
msg10129 (view) Author: twb Date: 2010-03-07.09:09:21
Petr Ročkai wrote:
> (1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs
> 2.4.1 requiring this h-s (and with same mmap dependency as h-s,
> i.e. >= 0.5 && < 0.6).
>
> (2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5
> dependency.
>
> I think Trent & other distribution folks would be happier about (1),
> even though (2) is a little safer and easier.

Debian currently has mmap 0.5.  We'd prefer not to downgrade Debian's
mmap to 0.4.x, but it's not a show stopper.  Ultimately Debian's
Haskell packaging focuses on "what the apps need", so if Darcs needs
0.4 then that's what Debian will go with.
msg10132 (view) Author: kowey Date: 2010-03-08.16:46:35
On Sat, Mar 06, 2010 at 19:01:45 +0000, Petr Ročkai wrote:
> (1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs 2.4.1 
> requiring this h-s (and with same mmap dependency as h-s, i.e. >= 0.5 && 
> < 0.6).
> (2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5 dependency.

By sheer conservatism, I think I'd vote for #2, personally.

Reinier, what do you think?

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg10137 (view) Author: tux_rocker Date: 2010-03-09.19:18:45
Op maandag 08 maart 2010 17:46 schreef Eric Kow:
> Eric Kow <kowey@darcs.net> added the comment:
> On Sat, Mar 06, 2010 at 19:01:45 +0000, Petr Ročkai wrote:
> > (1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs 2.4.1 
> > requiring this h-s (and with same mmap dependency as h-s, i.e. >= 0.5 && 
> > < 0.6).
> > (2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5 dependency.
> 
> By sheer conservatism, I think I'd vote for #2, personally.
> 
> Reinier, what do you think?

I think we ought to do what other users of mmap in the Haskell world do. Or if 
they haven't decided yet, I'd release with a < 0.5 dependency if there is no 
killer feature in mmap 0.5. Changing one line in darcs.cabal is to be 
preferred on a stable branch to making more substantial changes in the actual 
code.

Reinier
msg10143 (view) Author: kowey Date: 2010-03-09.20:14:33
I've confirmed with Reinier that his recommendation was just about the
choice of mmap version in general and not about this bug specifically.

OK, so I'm going to wade in and assert that restricting to mmap < 0.5 is
the right way to fix this for Darcs 2.4.1 (and that patch169 should go in).

I don't think we can afford the risk of a new mmap in such a short time.

For Darcs 2.5, on the other hand, it would make sense to bump up.
msg10190 (view) Author: kowey Date: 2010-03-14.20:26:57
The following patch updated the status of issue1753 to be resolved:

* Resolve issue1753: restrict mmap to version used by hashed-storage. 
Ignore-this: a53ca223c957f80ff5b021fc6c2026d8
Looks like we'll have to be careful about synchronising the dependencies.
msg11378 (view) Author: kowey Date: 2010-06-13.17:00:43
The following patch updated the status of issue1753 to be resolved-in-stable:

* Resolve issue1753: restrict mmap to version used by hashed-storage. 
Ignore-this: a53ca223c957f80ff5b021fc6c2026d8
Looks like we'll have to be careful about synchronising the dependencies.
msg11400 (view) Author: duncan.coutts Date: 2010-06-13.22:04:10
On Sat, 2010-03-06 at 11:15 +0000, Eric Kow wrote:
> Eric Kow <kowey@darcs.net> added the comment:
> 
> We need to focus on getting this resolved from a user point of view.
> 
> Let's narrow down to what happens when you have a vanilla hashed-storage
> from Hackage (ie. built against mmap-0.4). We do have users (mrothe and
> presumably derrida) who exactly fit this description; and we have no
> reason to believe that they are purposely jumping through any hoops to
> build darcs against mmap-0.5.
> 
> So I think we have (a) assume there is some sort of Cabal bug 

If you suspect it is a Cabal bug then of course I'd appreciate knowing
the details, symptoms, expected behaviour etc.

Duncan
msg11403 (view) Author: kowey Date: 2010-06-13.22:20:49
On Sun, Jun 13, 2010 at 23:06:41 +0100, Duncan Coutts wrote:
> > So I think we have (a) assume there is some sort of Cabal bug 
> 
> If you suspect it is a Cabal bug then of course I'd appreciate knowing
> the details, symptoms, expected behaviour etc.

Filed at http://hackage.haskell.org/trac/hackage/ticket/700

Sorry for the silent user syndrome! (users have a bad habit of just
working around bugs and never reporting them; embarrassing that we
should do the same!)

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg11413 (view) Author: duncan.coutts Date: 2010-06-14.12:49:10
On Sun, 2010-06-13 at 22:20 +0000, Eric Kow wrote:
> Eric Kow <kowey@darcs.net> added the comment:
> 
> On Sun, Jun 13, 2010 at 23:06:41 +0100, Duncan Coutts wrote:
> > > So I think we have (a) assume there is some sort of Cabal bug 
> > 
> > If you suspect it is a Cabal bug then of course I'd appreciate knowing
> > the details, symptoms, expected behaviour etc.
> 
> Filed at http://hackage.haskell.org/trac/hackage/ticket/700
> 
> Sorry for the silent user syndrome! (users have a bad habit of just
> working around bugs and never reporting them; embarrassing that we
> should do the same!)

So we think we've found the cause of the problem. See:

http://hackage.haskell.org/trac/hackage/ticket/700#comment:4
http://hackage.haskell.org/trac/hackage/ticket/701


Duncan
History
Date User Action Args
2010-03-01 11:09:35twbcreate
2010-03-01 11:19:20koweysetstatus: unknown -> needs-reproduction
topic: + Regression
nosy: + kowey, mornfall
messages: + msg10091
priority: urgent
assignedto: mornfall
2010-03-01 11:21:17koweysetmessages: + msg10092
2010-03-01 11:58:10mornfallsetmessages: + msg10093
2010-03-01 13:30:14koweysetnosy: + wferi
messages: + msg10096
2010-03-01 14:07:27dequuvaesetnosy: + dequuvae
2010-03-01 16:43:44koweysetstatus: needs-reproduction -> has-patch
nosy: + tux_rocker
messages: + msg10102
assignedto: mornfall -> (no value)
2010-03-01 17:58:53mornfallsetmessages: + msg10104
2010-03-03 14:49:09koweysettopic: + Target-2.4, Hashed
2010-03-03 14:49:26koweysetpriority: urgent -> critical
2010-03-03 16:58:04wferisetpriority: critical -> urgent
messages: + msg10111
2010-03-04 01:36:03twbsetmessages: + msg10116
2010-03-04 17:08:44wferisetmessages: + msg10118
2010-03-05 17:54:28wferisetpriority: urgent -> critical
messages: + msg10120
2010-03-05 17:55:16wferisetmessages: - msg10120
2010-03-05 17:59:09wferisetmessages: + msg10121
2010-03-06 09:38:47mornfallsetpriority: critical -> urgent
messages: + msg10122
2010-03-06 11:15:22koweysetnosy: + duncan
messages: + msg10123
2010-03-06 19:01:44mornfallsetmessages: + msg10126
2010-03-07 09:09:23twbsetmessages: + msg10129
2010-03-08 16:46:37koweysetmessages: + msg10132
2010-03-08 17:29:59koweysetassignedto: tux_rocker
2010-03-09 13:25:44koweysetpriority: urgent -> critical
2010-03-09 19:18:48tux_rockersetmessages: + msg10137
2010-03-09 20:14:35koweysetmessages: + msg10143
2010-03-14 20:27:01koweysetstatus: has-patch -> resolved
messages: + msg10190
2010-06-13 17:00:44koweysetstatus: resolved -> resolved-in-stable
messages: + msg11378
2010-06-13 22:04:11duncan.couttssetnosy: + duncan.coutts
messages: + msg11400
2010-06-13 22:20:50koweysetmessages: + msg11403
2010-06-14 12:49:11duncan.couttssetmessages: + msg11413
2010-06-15 21:31:14adminsetmilestone: 2.4.x
2010-06-15 21:31:16adminsettopic: - Target-2.4
2010-06-15 22:14:25adminsetstatus: resolved-in-stable -> resolved
2010-06-15 22:14:26adminsetresolvedin: 2.5.0
2010-06-16 14:40:39koweysetresolvedin: 2.5.0 -> 2.4.x