Created on 2010-03-01.11:09:35 by twb, last changed 2010-06-16.14:40:39 by kowey.
msg10088 (view) |
Author: twb |
Date: 2010-03-01.11:09:32 |
|
During ./Setup test, I get
darcs: opening of '_darcs/index' failed: does not exist (No such file or directory)
for lots of tests. Full transcript attached. I don't have time to
dig into this deeply today, sorry. Note that I've applied a patch
from mornfall to allow hashed-storage to build against mmap 0.5.x.
Attachments
|
msg10091 (view) |
Author: kowey |
Date: 2010-03-01.11:19:16 |
|
It sounds like it's important to prioritise this one, as we've had 3
independent reports (including Trent) of something like this happening
with folks trying out darcs 2.4
So two potential variables to explore here:
- GHC 6.12.1
- the 64 bit machine
Any other ideas?
I think we need somebody to get in touch with mrothe and derrida from
#darcs and go through some interview/debugging with them. Petr, may I
assign this to you, as it's index related?
|
msg10092 (view) |
Author: kowey |
Date: 2010-03-01.11:21:16 |
|
(Oh an obvious third variable which I failed to notice would be the mmap
0.5.x patch, probably the first one to elminate)
|
msg10093 (view) |
Author: mornfall |
Date: 2010-03-01.11:58:09 |
|
A (32b) build with mmap 0.5.3 patch, darcs HEAD, hashed-storage HEAD
works for me. I will test released versions later.
|
msg10096 (view) |
Author: kowey |
Date: 2010-03-01.13:30:12 |
|
Wagner (hereby added) has had the dubious pleasure of being victim #4 to
this issue. He's provided us with some new hints: he has GHC <= 6.10.x,
and a 32 bit machine, but also the new mmap 0.5.x version of
hashed-storage. Moreover, things are working for him when he downgrades
back to the released hashed-storage.
He's pointed out that even though this does not explain the problem for
victims #1 and #2, at least trying to figure out what went wrong in this
easy-to-reproduce scenario (ie. with mmap-0.5.x) may provide hints for
the more general problem...
|
msg10102 (view) |
Author: kowey |
Date: 2010-03-01.16:43:41 |
|
I'm not entirely sure if this explains it, but Markus (mrothe) reports
that he has mmap-0.5.x and importantly, a vanilla hashed-storage using
mmap-0.4.x.
I noticed that the Darcs mmap dependency was rather loose (>= 0.2).
Markus reports that tightening this up fixes the problem.
So I've sent patch169, but I still don't know *what* exactly happened... :-/
Looks like if this is really it, we need to release darcs-2.4.1 very soon.
|
msg10104 (view) |
Author: mornfall |
Date: 2010-03-01.17:58:51 |
|
With unpatched hashed-storage, you can't build darcs against mmap 0.5
(since it is not supported to link two versions of a single package into
a single binary). So if the problem really is due to mmap version, it is
only happening with unreleased versions, as far as I can tell. Or maybe
with buggy ghc/cabal/whatever.
|
msg10111 (view) |
Author: wferi |
Date: 2010-03-03.16:58:02 |
|
Mostly guessing: then maybe it's an issue of Darcs with mmap-0.5, since
you (Petr) audited the hashed-storage patch, but nobody cared about
checking the compatibility of Darcs itself with mmap-0.5.
|
msg10116 (view) |
Author: twb |
Date: 2010-03-04.01:36:01 |
|
Ferenc Wágner wrote:
>
> Ferenc Wágner <wferi@niif.hu> added the comment:
>
> Mostly guessing: then maybe it's an issue of Darcs with mmap-0.5, since
> you (Petr) audited the hashed-storage patch, but nobody cared about
> checking the compatibility of Darcs itself with mmap-0.5.
What happens if you build Darcs 2.4 against an mmap-0.5-based
hashed-storage, but pass -f-mmap to Darcs?
|
msg10118 (view) |
Author: wferi |
Date: 2010-03-04.17:08:38 |
|
"Trent W. Buck" <bugs@darcs.net> writes:
> What happens if you build Darcs 2.4 against an mmap-0.5-based
> hashed-storage, but pass -f-mmap to Darcs?
Hell breaks loose:
stat64("_darcs/index_invalid", 0xb6f47330) = -1 ENOENT (No such file or directory)
stat64("_darcs/index", {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
open("_darcs/index", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 6
fstat64(6, {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
mmap2(NULL, 4, PROT_READ, MAP_PRIVATE, 6, 0) = 0xb7791000
close(6) = 0
stat64("_darcs/index", {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
stat64("_darcs/index", {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
open("_darcs/index", O_RDWR|O_NOCTTY|O_LARGEFILE) = 6
fstat64(6, {st_mode=S_IFREG|0664, st_size=200, ...}) = 0
close(6) = 0
write(2, "darcs: "..., 7) = 7
write(2, "mmap of '_darcs/index' failed, of"..., 109) = 109
write(2, "\n"..., 1) = 1
darcs: mmap of '_darcs/index' failed, offset and size beyond end of file: does not exist (No such file or directory)
As far as I can see, we're in Darcs/Repository/State.hs:readIndex
starting with two doesFileExist (stat64) calls, then I.indexFormatValid
(open, fstat64, mmap2, close) returns True, thus finally I.readIndex
calling mmapIndex, doing another doesFileExist (stat64) and a
getFileStatus (stat64), then mmapFileForeignPtr calls mmapFilePtr, which
calls mmapFileOpen (open) then sanitizeFileRegion, which calls
c_system_io_file_size (fstat64) but doesn't like the size and offset
values. The throwErrno function isn't appropriate here, as the error
has nothing to do with errno and the corresponding error string.
readIndex says: mmapIndex indexpath 0, so size becomes act_size there,
and mmapFileForeignPtr gets called with range (0,act_size+size_magic),
ie. a range size_magic (4) bytes longer than the file itself. Thus
longsize<(offset + fromIntegral size) is true, and sanitizeFileRegion
throws the error.
I'd say we found a bug in hashed-storage's mmapIndex, probably exhibited
by the new mmap interface. And another in mmap (throwErrno usage).
--
Regards,
Feri.
|
msg10121 (view) |
Author: wferi |
Date: 2010-03-05.17:59:07 |
|
Something like the change below would seem logical, and it even works to
some extent (but still only a shot in the dark):
hunk ./Storage/Hashed/Index.hs 212
act_size <- if exist then fileSize `fmap` getFileStatus indexpath
else return 0
let size :: Int
- size = fromIntegral $
- if req_size > 0 then fromIntegral req_size else act_size
+ size = if req_size > 0 then req_size else fromIntegral act_size - size_magic
case size of
0 -> return (castForeignPtr nullForeignPtr, size)
_ -> do (x, _, _) <- mmapFileForeignPtr indexpath
hunk ./Storage/Hashed/Index.hs 216
- ReadWrite (Just (0, size + size_magic))
+ ReadWrite (Just (fromIntegral size_magic, size))
return (x, size)
data IndexM m = Index { mmap :: (ForeignPtr ())
Hmm, I see I lowered the priority of this bug. Sorry, it really wasn't
my intention, so I change it back to critical now...
Eek, maybe an email comment won't word-wrap the patch.
|
msg10122 (view) |
Author: mornfall |
Date: 2010-03-06.09:38:44 |
|
(1) Darcs with -f-mmap works just fine.
(2) I have never blessed the mmap-0.5 patch for hashed-storage. It's not
even in HEAD, not to say anything about released versions of h-s.
(Although I have used it locally without issues for a while.)
(3) If cabal happily builds darcs with two different mmap versions
linked in, that's a cabal bug. By default, that does not happen. I have
to specify --constraint 'mmap > 0.5' to darcs's configure to get a
broken binary.
As for the size/offset bug in h-s, good catch -- you are right that the
size_magic is added redundantly. Nevertheless, from the point of view of
darcs *working*, this is a harmless bug, just making the index longer
than strictly necessary.
However, I see that if you are using h-s with mmap-0.5, you have a bogus
patch for that. Please see http://pastebin.ca/1806460 -- that's the
patch I sent to Trent and that actually works. From the patch you
propose, I see you don't have this one, therefore the breakage...
|
msg10123 (view) |
Author: kowey |
Date: 2010-03-06.11:15:18 |
|
We need to focus on getting this resolved from a user point of view.
Let's narrow down to what happens when you have a vanilla hashed-storage
from Hackage (ie. built against mmap-0.4). We do have users (mrothe and
presumably derrida) who exactly fit this description; and we have no
reason to believe that they are purposely jumping through any hoops to
build darcs against mmap-0.5.
So I think we have (a) assume there is some sort of Cabal bug and (b)
take concrete action (patch169) to cope with this [because it ultimately
does not matter where the bug lies if our users are tripping over it].
What's odd is that we're not getting more reports about this. Either
users are being really really passive (or haven't upgraded yet), or
there are some sort pre-conditions that have to be fulfilled for this to
trigger, eg. you already have mmap-0.5 on your machine before you cabal
install darcs, or we're just missing something...
|
msg10126 (view) |
Author: mornfall |
Date: 2010-03-06.19:01:40 |
|
Eric, I don't disagree. I, however, can do little about it -- you
probably need to talk to Reinier. I have done as much as I could
diagnosing the problem. There are several options on dealing with it,
but we need to pick one and proceed...
(1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs 2.4.1
requiring this h-s (and with same mmap dependency as h-s, i.e. >= 0.5 &&
< 0.6).
(2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5 dependency.
I think Trent & other distribution folks would be happier about (1),
even though (2) is a little safer and easier.
Ultimately however, this is something that needs coordination between h-
s and darcs, and I can only speak for h-s at this point. So if we settle
on a solution (maybe even a different one than 1/2 above, I don't
particularly care as long as it works), let me know and I can upload my
part of the deal. A darcs release can follow shortly.
|
msg10129 (view) |
Author: twb |
Date: 2010-03-07.09:09:21 |
|
Petr Ročkai wrote:
> (1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs
> 2.4.1 requiring this h-s (and with same mmap dependency as h-s,
> i.e. >= 0.5 && < 0.6).
>
> (2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5
> dependency.
>
> I think Trent & other distribution folks would be happier about (1),
> even though (2) is a little safer and easier.
Debian currently has mmap 0.5. We'd prefer not to downgrade Debian's
mmap to 0.4.x, but it's not a show stopper. Ultimately Debian's
Haskell packaging focuses on "what the apps need", so if Darcs needs
0.4 then that's what Debian will go with.
|
msg10132 (view) |
Author: kowey |
Date: 2010-03-08.16:46:35 |
|
On Sat, Mar 06, 2010 at 19:01:45 +0000, Petr Ročkai wrote:
> (1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs 2.4.1
> requiring this h-s (and with same mmap dependency as h-s, i.e. >= 0.5 &&
> < 0.6).
> (2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5 dependency.
By sheer conservatism, I think I'd vote for #2, personally.
Reinier, what do you think?
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
|
msg10137 (view) |
Author: tux_rocker |
Date: 2010-03-09.19:18:45 |
|
Op maandag 08 maart 2010 17:46 schreef Eric Kow:
> Eric Kow <kowey@darcs.net> added the comment:
> On Sat, Mar 06, 2010 at 19:01:45 +0000, Petr Ročkai wrote:
> > (1) Release h-s that uses (and requires) mmap >= 0.5 and a darcs 2.4.1
> > requiring this h-s (and with same mmap dependency as h-s, i.e. >= 0.5 &&
> > < 0.6).
> > (2) Keep h-s as it is, release darcs 2.4.1 with mmap < 0.5 dependency.
>
> By sheer conservatism, I think I'd vote for #2, personally.
>
> Reinier, what do you think?
I think we ought to do what other users of mmap in the Haskell world do. Or if
they haven't decided yet, I'd release with a < 0.5 dependency if there is no
killer feature in mmap 0.5. Changing one line in darcs.cabal is to be
preferred on a stable branch to making more substantial changes in the actual
code.
Reinier
|
msg10143 (view) |
Author: kowey |
Date: 2010-03-09.20:14:33 |
|
I've confirmed with Reinier that his recommendation was just about the
choice of mmap version in general and not about this bug specifically.
OK, so I'm going to wade in and assert that restricting to mmap < 0.5 is
the right way to fix this for Darcs 2.4.1 (and that patch169 should go in).
I don't think we can afford the risk of a new mmap in such a short time.
For Darcs 2.5, on the other hand, it would make sense to bump up.
|
msg10190 (view) |
Author: kowey |
Date: 2010-03-14.20:26:57 |
|
The following patch updated the status of issue1753 to be resolved:
* Resolve issue1753: restrict mmap to version used by hashed-storage.
Ignore-this: a53ca223c957f80ff5b021fc6c2026d8
Looks like we'll have to be careful about synchronising the dependencies.
|
msg11378 (view) |
Author: kowey |
Date: 2010-06-13.17:00:43 |
|
The following patch updated the status of issue1753 to be resolved-in-stable:
* Resolve issue1753: restrict mmap to version used by hashed-storage.
Ignore-this: a53ca223c957f80ff5b021fc6c2026d8
Looks like we'll have to be careful about synchronising the dependencies.
|
msg11400 (view) |
Author: duncan.coutts |
Date: 2010-06-13.22:04:10 |
|
On Sat, 2010-03-06 at 11:15 +0000, Eric Kow wrote:
> Eric Kow <kowey@darcs.net> added the comment:
>
> We need to focus on getting this resolved from a user point of view.
>
> Let's narrow down to what happens when you have a vanilla hashed-storage
> from Hackage (ie. built against mmap-0.4). We do have users (mrothe and
> presumably derrida) who exactly fit this description; and we have no
> reason to believe that they are purposely jumping through any hoops to
> build darcs against mmap-0.5.
>
> So I think we have (a) assume there is some sort of Cabal bug
If you suspect it is a Cabal bug then of course I'd appreciate knowing
the details, symptoms, expected behaviour etc.
Duncan
|
msg11403 (view) |
Author: kowey |
Date: 2010-06-13.22:20:49 |
|
On Sun, Jun 13, 2010 at 23:06:41 +0100, Duncan Coutts wrote:
> > So I think we have (a) assume there is some sort of Cabal bug
>
> If you suspect it is a Cabal bug then of course I'd appreciate knowing
> the details, symptoms, expected behaviour etc.
Filed at http://hackage.haskell.org/trac/hackage/ticket/700
Sorry for the silent user syndrome! (users have a bad habit of just
working around bugs and never reporting them; embarrassing that we
should do the same!)
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
|
msg11413 (view) |
Author: duncan.coutts |
Date: 2010-06-14.12:49:10 |
|
On Sun, 2010-06-13 at 22:20 +0000, Eric Kow wrote:
> Eric Kow <kowey@darcs.net> added the comment:
>
> On Sun, Jun 13, 2010 at 23:06:41 +0100, Duncan Coutts wrote:
> > > So I think we have (a) assume there is some sort of Cabal bug
> >
> > If you suspect it is a Cabal bug then of course I'd appreciate knowing
> > the details, symptoms, expected behaviour etc.
>
> Filed at http://hackage.haskell.org/trac/hackage/ticket/700
>
> Sorry for the silent user syndrome! (users have a bad habit of just
> working around bugs and never reporting them; embarrassing that we
> should do the same!)
So we think we've found the cause of the problem. See:
http://hackage.haskell.org/trac/hackage/ticket/700#comment:4
http://hackage.haskell.org/trac/hackage/ticket/701
Duncan
|
|
Date |
User |
Action |
Args |
2010-03-01 11:09:35 | twb | create | |
2010-03-01 11:19:20 | kowey | set | status: unknown -> needs-reproduction priority: urgent nosy:
+ kowey, mornfall messages:
+ msg10091 topic:
+ Regression assignedto: mornfall |
2010-03-01 11:21:17 | kowey | set | messages:
+ msg10092 |
2010-03-01 11:58:10 | mornfall | set | messages:
+ msg10093 |
2010-03-01 13:30:14 | kowey | set | nosy:
+ wferi messages:
+ msg10096 |
2010-03-01 14:07:27 | dequuvae | set | nosy:
+ dequuvae |
2010-03-01 16:43:44 | kowey | set | status: needs-reproduction -> has-patch nosy:
+ tux_rocker messages:
+ msg10102 assignedto: mornfall -> (no value) |
2010-03-01 17:58:53 | mornfall | set | messages:
+ msg10104 |
2010-03-03 14:49:09 | kowey | set | topic:
+ Target-2.4, Hashed |
2010-03-03 14:49:26 | kowey | set | priority: urgent -> critical |
2010-03-03 16:58:04 | wferi | set | priority: critical -> urgent messages:
+ msg10111 |
2010-03-04 01:36:03 | twb | set | messages:
+ msg10116 |
2010-03-04 17:08:44 | wferi | set | messages:
+ msg10118 |
2010-03-05 17:54:28 | wferi | set | priority: urgent -> critical messages:
+ msg10120 |
2010-03-05 17:55:16 | wferi | set | messages:
- msg10120 |
2010-03-05 17:59:09 | wferi | set | messages:
+ msg10121 |
2010-03-06 09:38:47 | mornfall | set | priority: critical -> urgent messages:
+ msg10122 |
2010-03-06 11:15:22 | kowey | set | nosy:
+ duncan messages:
+ msg10123 |
2010-03-06 19:01:44 | mornfall | set | messages:
+ msg10126 |
2010-03-07 09:09:23 | twb | set | messages:
+ msg10129 |
2010-03-08 16:46:37 | kowey | set | messages:
+ msg10132 |
2010-03-08 17:29:59 | kowey | set | assignedto: tux_rocker |
2010-03-09 13:25:44 | kowey | set | priority: urgent -> critical |
2010-03-09 19:18:48 | tux_rocker | set | messages:
+ msg10137 |
2010-03-09 20:14:35 | kowey | set | messages:
+ msg10143 |
2010-03-14 20:27:01 | kowey | set | status: has-patch -> resolved messages:
+ msg10190 |
2010-06-13 17:00:44 | kowey | set | status: resolved -> resolved-in-stable messages:
+ msg11378 |
2010-06-13 22:04:11 | duncan.coutts | set | nosy:
+ duncan.coutts messages:
+ msg11400 |
2010-06-13 22:20:50 | kowey | set | messages:
+ msg11403 |
2010-06-14 12:49:11 | duncan.coutts | set | messages:
+ msg11413 |
2010-06-15 21:31:14 | admin | set | milestone: 2.4.x |
2010-06-15 21:31:16 | admin | set | topic:
- Target-2.4 |
2010-06-15 22:14:25 | admin | set | status: resolved-in-stable -> resolved |
2010-06-15 22:14:26 | admin | set | resolvedin: 2.5.0 |
2010-06-16 14:40:39 | kowey | set | resolvedin: 2.5.0 -> 2.4.x |
|