darcs

Issue 1972 git/hg/bzr as darcs clients

Title git/hg/bzr as darcs clients
Priority feature Status given-up
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, simon, twb, warner-darcs-bugs
Assigned To
Topics Community, Documentation

Created on 2010-10-14.09:17:30 by kowey, last changed 2017-07-31.01:28:08 by gh.

Messages
msg12692 (view) Author: kowey Date: 2010-10-14.09:17:28
It's important that Darcs adoption be risk-free for projects, but due to
issue1971, I think it's important that we be aware about techniques
people can use to talk to Darcs repositories via the big 3 DVCS

Note that this is deliberately an asymmetrical ticket (Darcs as a Git
client is a different story). It's also not necessarily a technical
ticket.  Maybe all we need is a single page on the Darcs wiki pointing
to easy recipes for contributing to Darcs repositories via other DVCSes.

See also the related issue1627, which is a bit of a long-term dream, but
not the same problem.  This is a more immediate issue
msg12693 (view) Author: kowey Date: 2010-10-14.09:25:02
Oops, sorry for silently assigning this ticket to you, Brian!  I hear
from Zooko that you contribute to Tahoe-LAFS via Git.  Could you say a
couple of words about how you do this? IS it with something like Tailor?

My goal is to have a page like http://wiki.darcs.net/Bridge
msg12762 (view) Author: warner-darcs-bugs Date: 2010-10-17.19:13:02
Sure thing. There's a format called "fast-import" which originated
with Git, I believe, but is probably useful as a pan-DVCS interchange
format (at least among the revision-based systems.. Darcs is in a
different category, so there's somewhat of an impedance mismatch).
It's a simple text-based stream of information about each change,
expressed as commands in a simple language, like
ADD(filename,contents), DELETE(filename), DELETEALL, RENAME(old,new),
etc. There are a collection of tools to emit and consume this stream,
usually named "*-fast-import" or "*-fast-export".

The one I'm using is called "darcs-fast-export". It started with
Miklos Vajna (http://vmiklos.hu/project/darcs-fast-export/), was
merged into bzr-fastimport (https://launchpad.net/bzr-fastimport),
but sadly the maintainer of bzr-fastimport passed away. The most
recent source I've seen is on Miklos' mirror at
git://vmiklos.hu/bzr-fastimport . I have a few patches of my own
(rather old, I don't know if they've been merged elsewhere) at
http://github.com/warner/darcs-fast-export .

darcs-fast-export installs a new subcommand for git (stored as a
program named "git-darcs" somewhere on your $PATH, so when you run
'git darcs', /usr/bin/git winds up invoking /usr/bin/git-dracs). It
also installs two programs named "darcs-fast-export" and
"darcs-fast-import", which are run on a darcs repo and create/consume
the fast-import stream. These commands understand "mark files", which
record numbered revision/patch identifier, so you can re-start the
process later and skip over the work that was already done.

The bidirectional bridge I use for Tahoe has a tahoe.darcs and a
tahoe.git sitting next to each other. There's a simple Makefile with
"push" and "pull" targets. The tahoe.git repo was set up with
something like:

 mkdir tahoe.git
 cd tahoe.git
 git init
 git darcs add upstream ../tahoe.darcs --encoding=utf-8
--authors-file=../authormap

(oh yeah, there's an 'authormap' which manages conversion of author
names from darcs' metadata into git's)

The 'pull' target then does:

 cd tahoe.darcs && darcs pull -a
 cd tahoe.git && git darcs pull upstream

and 'push' does:

 cd tahoe.git && git darcs push upstream
 cd tahoe.darcs && darcs push -a

(the tahoe.darcs repo has a _darcs/prefs/defaultrepo that points to
the canonical Tahoe darcs repo, so that step is what triggers the
buildbot, publishes it to the world, etc)

I make sure to do a 'pull' before doing a 'push'. I also have some
scripts to check all the internal repos for uncommitted changes after
either, to guard against some of the bugs below, so I know when to
step in and fix things up manually.

Problems:

there are a number of bugs in darcs-fast-import that I've had to work
around. I tried to fix a few, but was not always successful. Here are
the ones I can remember:

 1: bad UTF-8 in darcs commit messages. We have one early commit,
    from around 2007, that has a spurious \xc2 in a checkin comment.
    (I suspect that modern versions of darcs will prevent this, but
    back then, maybe not). darcs-fast-export uses 'darcs changes
    --xml' to find out what work needs to be done, and of course runs
    an XML parser on the output. The stray \xc2 caused the parser to
    fail. My workaround was to replace the \xc2 with "X" before
    passing it to the XML parser: ugly but functional. If I were
    re-running the conversion now, I'd make the replacement be much
    more specific, so that it would only ever spot the \xc2 in that
    particular comment, and just delete the \xc2 completely.

 2: darcs weirdness while applying patches: darcs-fast-export appears
    to work by reading patch metadata out of the XML output, using
    that to locate a file in _darcs/patches/PATCHID.gz, combining the
    metadata with the patch.gz to construct a stream that is fed into
    "darcs apply" on an internal darcs repo, and walking the
    newly-modified darcs tree to locate files that have changed, then
    putting those files and their contents into the fast-export
    command stream. (actually, I think it puts files/contents of
    *all* files into that stream, rather than trying to figure out a
    minimal subset that have changed). I had at least one instance
    where the 'darcs apply' failed because of uncommitted changes
    left over by a previous apply, probably when conflicts or their
    resolving patches left the tree in a weird state. I believe I had
    to do a 'darcs repair' on the internal repo to get it to
    continue.

 3: not removing all deleted files in the darcs->git direction. The
    original darcs-fast-export script behaved correctly for full
    conversions, but failed to emit all the necessary DELETE()
    commands when doing a partial/incremental conversion. (The
    implementation decision allows full conversions to run faster, so
    I suspect that incremental operation was added later and missed a
    few cases). I changed the script to emit a full DELETEALL()
    command and then re-add every file back in.

 4: not handling multiple git patches on the git->darcs direction: my
    memory is fuzzy, but I think the problem I observed was when I'd
    made several git commits at once, then pushed them all at the
    same time. The tool would convert them all over to darcs
    correctly (creating multiple darcs patches), but then would fail
    to update something (the internal darcs repo?) with the non-final
    patch, such that next time I ran it, the tree was in a weird
    state. Maybe the git->darcs direction involves pushing patches
    directly to the target tahoe.darcs repo, and doesn't necessarily
    update the internal darcs repo used for the darcs->git direction.
    My workaround was to run a double-checking script that looked for
    unpushed patches (just a 'darcs push' on each repo). What I
    always type is:

      make pull
      make push
      make check-everything

    and if the 'darcs push'es inside the check-everything target tell
    me that something needs to be pushed, I say yes. This only seems
    to happen when I try to git->darcs multiple git revisions at the
    same time, so I've kind of learned to not do that.

Since the bridge is bidirectional, I'm worried about collisions and
how to resolve them, so I always run it manually. Back in 2006/2007
when we were migrating an earlier project from SVN to Darcs, we had
an automatic bidirectional svn<->darcs bridge running, and every
couple of months the whole thing would explode messily and we'd have
to drag Zooko in for several days of cleanup and debugging (during
which we couldn't touch the tree). One A frequent source of ignition
was committing a change to one side of the bridge at the same time as
someone else was committing a different change to the other side. Or,
when something in the bridge failed (a post-commit hook couldn't
reach the buildbot, or something), leaving the bridge in a slightly
bad state, such that the next commit would really mess things up
(visualize a multi-car pileup on a freeway, except with patches
instead of cars). Very finicky.

So I only run this new git<->darcs bridge manually, where I can watch
the results and make sure everything went smoothly. The commit rate
on Tahoe is pretty slow right now, so collisions aren't very likely,
but I'm worried enough about them that I don't want to automate
anything. The resulting workflow is a drag.

Incidentally, I remember giving Tailor a try when I first started
this project, but I gave up after a few days.. I no longer recall
why. I think the fast-import stream seemed like a better idea to me,
and darcs-fast-export looked to be more mature than the
incremental-conversion support in tailor. I do remember tailor being
fairly hard to set up, whereas darcs-fast-export (before I delved
into bugfixing) was pretty easy.

hope that helps,
 -Brian
msg13092 (view) Author: kowey Date: 2010-11-17.17:34:17
Hi Brian,

Thanks for your detailed summary! We've linked to it from our wiki page 
on Darcs -> Foo bridges.

Meanwhile, you may have noticed that Petr Rockai has recently released 
version 0.2 of the darcs-fastconvert program, which uses the darcs 
library underneath and which now provides marksfile support (if I 
understand correctly, that opens the door to using it for incremental 
conversions).

Petr tells me that points 2-4 should not be a problem for darcs-
fastconvert.

Perhaps you'd be willing to try it as a potential replacement for darcs-
fast-export? Hopefully, we can get to a state where fast-{import,export} 
can converge on one implementation that's relatively close to darcs.
msg13093 (view) Author: kowey Date: 2010-11-17.17:35:57
> Hopefully, we can get to a state where fast-{import,export} 
> can converge on one implementation that's relatively close to darcs

I meant fast-import/export *users*, ie. it may be time to retire darcs-
fastexport in favour of darcs-fastconvert if the latter is up to scratch.
msg17543 (view) Author: gh Date: 2014-06-12.19:02:08
Note that darcs (screened) now has incremental fast-export support
<http://hub.darcs.net/darcs/darcs-screened/patch/20140609190214-5ef8f> .
History
Date User Action Args
2010-10-14 09:17:30koweycreate
2010-10-14 09:25:03koweysetstatus: unknown -> waiting-for
messages: + msg12693
2010-10-14 09:26:26koweysetpriority: feature
2010-10-17 19:13:03warner-darcs-bugssetassignedto: warner-darcs-bugs ->
messages: + msg12762
2010-11-17 17:34:18koweysetstatus: waiting-for -> unknown
messages: + msg13092
2010-11-17 17:35:58koweysetmessages: + msg13093
2014-06-12 19:02:09ghsetnosy: + darcs-devel, simon
messages: + msg17543
2017-07-31 01:28:08ghsetstatus: unknown -> given-up