darcs

Issue 2364 missing pristine file with GHC 7.8

Title missing pristine file with GHC 7.8
Priority critical Status resolved
Milestone Resolved in 2.10.0
Superseder Nosy List dolio, ganesh, gh
Assigned To
Topics

Created on 2014-03-29.16:58:47 by gh, last changed 2014-06-18.19:52:29 by noreply.

Messages
msg17244 (view) Author: gh Date: 2014-03-29.16:58:46
As reported by dolio on IRC
<http://irclog.perlgeek.de/darcs/2014-03-26#i_8496431> :

<dolio> Heffalump: 7.6.3 dependencies: http://lpaste.net/7848907365976899584
<dolio> 7.8 dependencies: http://lpaste.net/6243537088533233664
<dolio> Differences are newer array, base, binary, bytestring,
containers, directory (slightly), filepath (slightly), oldtime
(slightly), process, and unix. mtl and text are older in my 7.8 install,
but that probably means nothing.
<dolio> It's really easy to reproduce, at least on my machine.
<dolio> darcs get http://hub.darcs.net/dolio/bytestring-show ; cd
bytestring-show ; darcs whatsnew
<dolio> That complains about a missing pristine file.

[...]

<dolio> notdan: Obviously someone underestimated the filepath 1.3.0.1 to
1.3.0.2 change when calculating the new version number.
msg17246 (view) Author: dolio Date: 2014-03-30.03:35:34
I should note: that last line was a joke. I have no idea why this
happens, and it seems unlikely that whatever changed in filepath is the
culprit.

My first guess was something CPP-induced, but there don't seem to be any
7.8 CPP checks in the darcs code proper (only in tests, I think).
msg17363 (view) Author: trofi Date: 2014-04-19.19:29:22
My fastest reproducer now is the following:

    darcs.net # (cd dist; rm -rf Agda ~/.cache/darcs/; strace -f -olog
build/darcs/darcs get --debug-verbose http://code.haskell.org/Agda; )

After 5-10 seconds the error is:

    Couldn't fetch
`b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985'
    in subdir pristine.hashed from sources:

sf darcs.net # grep
b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985 dist/log 
    26142
stat("/tmp/darcs.net/dist/Agda/_darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985",
 <unfinished ...>
    26142
stat("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985",
0x7fd770875d30) = -1 ENOENT (No such file or directory)
    26143
open("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985-new_45186a86b21849f69c948c18d3510725",
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
    26143
lstat("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985-new_45186a86b21849f69c948c18d3510725",
{st_mode=S_IFREG|0644, st_size=138, ...}) = 0
    26143
rename("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985-new_45186a86b21849f69c948c18d3510725",
"/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985")
= 0
    26142
stat("/tmp/darcs.net/dist/Agda/_darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985",
0x7fd7708778b0) = -1 ENOENT (No such file or directory)
    26142
stat("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985",
{st_mode=S_IFREG|0644, st_size=138, ...}) = 0
    26142
unlink("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985-new_45186a86b21849f69c948c18d3510725")
= -1 ENOENT (No such file or directory)

What is strange here is that rename/stat are done in different processes
(likely simultaneously):
    26143
rename("/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985-new_45186a86b21849f69c948c18d3510725",
"/root/.cache/darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985")
= 0
    26142
stat("/tmp/darcs.net/dist/Agda/_darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985",
0x7fd7708778b0) = -1 ENOENT (No such file or directory)
Looks like a race condition due to ghc-7.8 IO manager changes.
msg17364 (view) Author: trofi Date: 2014-04-19.20:44:44
Looks like this sequence works fine in ghc-7.6, but not 7.8:

downloadComplete URL ok:
("http://code.haskell.org/Agda/_darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985","")
URL.waitNextUrl succeeded:
http://code.haskell.org/Agda/_darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985
./darcs4367
CH:
("b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985","\US\139\b\NUL\NUL\NUL\NUL\NUL\NUL\ETXM\140A\n\195\&0\f\EOT\239z\133\207\129\232P\250\129@\DEL\144\a\EOT!\171\174\169\DC3\ESCK\162\208C\223\222\146\DC2\200mv`v\137\212Y\SOH\a\212W\131\225\ETX\213\r\200\173\174W\DC3d\226\135\EOT\174\219='\239\a\161\SUB\153\235\177JM`k\131)E\186\160Q\199\244\ACK\188\205\203l\181\v\168\152\183\DLE\179\SUB\224\206\227\191\v\152\183\223Q)\DC2\199\246L'\219\n\177\156\FS|\SOH\ACK\197+\188\165\NUL\NUL\NUL")
CHV:
("b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985","58dc4ebe3bed8b1d4a2f4641f70ff3ee000a938a774d9918e50531d791811bf2")
Hash failure in
http://code.haskell.org/Agda/_darcs/pristine.hashed/b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985
msg17365 (view) Author: trofi Date: 2014-04-19.20:51:50
The same from ghc-7.6:

CH:
("b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985","_darcs\n.*.swp\n*~\nout\nautom4te.cache
configure config.statu
s config.log\ntmp\nAgda2.tar.gz\n.DS_Store\nsetup dist\n.setup-config
.installed-pkg-config .inplace-pkg-config\n")
CHV:
("b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce985","b5153164a1034be95d7eaabdd69a90c3ec6896a39d4d53ecfa915679d54ce
985")
msg17366 (view) Author: trofi Date: 2014-04-19.20:52:42
The one is gzip strem, another is not.
msg17415 (view) Author: trofi Date: 2014-04-30.09:10:07
Sent a patch to ML.

Here is a copy just in case:
http://code.haskell.org/~slyfox/darcs-HEAD-issue2364-fix.patch.txt
msg17417 (view) Author: ganesh Date: 2014-05-01.19:58:05
The patch is available at patch1149
msg17420 (view) Author: ganesh Date: 2014-05-01.22:14:11
The fix for this is now in the repo at http://darcs.net - I haven't 
relaxed the GHC 7.8 restriction just yet, but if anyone wants to try it 
just grab that repo and edit the build-tools line in darcs.cabal.
msg17451 (view) Author: noreply Date: 2014-05-13.21:44:04
The following patch sent by Sergei Trofimovich <slyfox@community.haskell.org> updated issue issue2364 with
status=resolved;resolvedin=2.10.0 HEAD

* resolve issue2364: don't break list of 'bad sources' 
Ignore-this: d68861b39305be9b7c77e9a6fca89601

This time the bug manifested on a simple operation:
$ darcs record -a -m "something"

Attempt to write a patch resulted in something like:
  Failed to record patch 'hello'

  HINT: I could not reach the following repositories:
        http://repetae.net/repos/jhc
        /home/st/.darcs/cache
        /home/st/.cache/darcs
        /home/st/dev/darcs/jhc
      If you're not using them, you should probably delete

The sequence should be the following:
    1. store patch to inventory/foo
    2. try to store to a writable cache (say, ~/.darcs/cache/patches)
    3. fail to write
    4. filter out bad caches
    5. try again
    6. copy from cache to patches/

Due to missing NOINLINE step 4. led to
all caches treated as writable, thus step 5
failed without a chance for patch to
go to 'patches/'.

As a side-effect building darcs with -O0 produced seemingly working darcs.
Reported-by: Ivan Miljenovic
msg17554 (view) Author: noreply Date: 2014-06-18.19:52:27
The following patch sent by Sergei Trofimovich <slyfox@community.haskell.org> updated issue issue2364 with
status=resolved;resolvedin=2.10.0 HEAD

* resolve issue2364: fix file corruption on double fetch 
Ignore-this: 9b069f38724673da4ff427f538132c24

The bug is the result of attempt to fetch the same file
(say F) by the same URL (U) multiple times concurrently.

First time U gets fetched by speculative prefetch logic.
Second time as an ordinary file (while first fetch is not finished).

The function 'copyUrlWithPriority' sends download request
to 'urlChan' both times (it's already not a nice situation,
fixed by this patch).

Later urlThread satisfies first request, notifies receiver,
and starts downloading exactly the same U again.

I don't know exact data corruption mechanics yet, but it has
to do with non-random intermediate file names of downloaded
files and 'truncate' call when temp file is opened for a new
downlaod job.

All temp names are completely non-random for a single darcs run:

  urlThread :: Chan UrlRequest -> IO ()
  urlThread ch = do
    junk <- flip showHex "" `fmap` randomRIO rrange
    evalStateT urlThread' (UrlState Map.empty emptyQ 0 junk)

  createDownloadFileName :: FilePath -> UrlState -> FilePath
  createDownloadFileName f st = f ++ "-new_" ++ randomJunk st

My theory is next download manages to step on toes of previous job.

I'll try to make file names truly random in other patch.
That way such errors should manifest as read erros instead of data
corruption.

Thanks!
History
Date User Action Args
2014-03-29 16:58:47ghcreate
2014-03-30 03:35:35doliosetnosy: + dolio
messages: + msg17246
2014-04-19 19:29:24trofisetmessages: + msg17363
2014-04-19 20:44:45trofisetmessages: + msg17364
2014-04-19 20:51:52trofisetmessages: + msg17365
2014-04-19 20:52:43trofisetmessages: + msg17366
2014-04-30 09:10:09trofisetstatus: unknown -> has-patch
messages: + msg17415
2014-05-01 19:58:06ganeshsetnosy: + ganesh
messages: + msg17417
2014-05-01 19:58:16ganeshsetpriority: critical
2014-05-01 22:14:12ganeshsetmessages: + msg17420
2014-05-13 21:44:06noreplysetstatus: has-patch -> resolved
messages: + msg17451
resolvedin: 2.10.0
2014-06-18 19:52:29noreplysetmessages: + msg17554