Sure thing. There's a format called "fast-import" which originated
with Git, I believe, but is probably useful as a pan-DVCS interchange
format (at least among the revision-based systems.. Darcs is in a
different category, so there's somewhat of an impedance mismatch).
It's a simple text-based stream of information about each change,
expressed as commands in a simple language, like
ADD(filename,contents), DELETE(filename), DELETEALL, RENAME(old,new),
etc. There are a collection of tools to emit and consume this stream,
usually named "*-fast-import" or "*-fast-export".
The one I'm using is called "darcs-fast-export". It started with
Miklos Vajna (http://vmiklos.hu/project/darcs-fast-export/), was
merged into bzr-fastimport (https://launchpad.net/bzr-fastimport),
but sadly the maintainer of bzr-fastimport passed away. The most
recent source I've seen is on Miklos' mirror at
git://vmiklos.hu/bzr-fastimport . I have a few patches of my own
(rather old, I don't know if they've been merged elsewhere) at
http://github.com/warner/darcs-fast-export .
darcs-fast-export installs a new subcommand for git (stored as a
program named "git-darcs" somewhere on your $PATH, so when you run
'git darcs', /usr/bin/git winds up invoking /usr/bin/git-dracs). It
also installs two programs named "darcs-fast-export" and
"darcs-fast-import", which are run on a darcs repo and create/consume
the fast-import stream. These commands understand "mark files", which
record numbered revision/patch identifier, so you can re-start the
process later and skip over the work that was already done.
The bidirectional bridge I use for Tahoe has a tahoe.darcs and a
tahoe.git sitting next to each other. There's a simple Makefile with
"push" and "pull" targets. The tahoe.git repo was set up with
something like:
mkdir tahoe.git
cd tahoe.git
git init
git darcs add upstream ../tahoe.darcs --encoding=utf-8
--authors-file=../authormap
(oh yeah, there's an 'authormap' which manages conversion of author
names from darcs' metadata into git's)
The 'pull' target then does:
cd tahoe.darcs && darcs pull -a
cd tahoe.git && git darcs pull upstream
and 'push' does:
cd tahoe.git && git darcs push upstream
cd tahoe.darcs && darcs push -a
(the tahoe.darcs repo has a _darcs/prefs/defaultrepo that points to
the canonical Tahoe darcs repo, so that step is what triggers the
buildbot, publishes it to the world, etc)
I make sure to do a 'pull' before doing a 'push'. I also have some
scripts to check all the internal repos for uncommitted changes after
either, to guard against some of the bugs below, so I know when to
step in and fix things up manually.
Problems:
there are a number of bugs in darcs-fast-import that I've had to work
around. I tried to fix a few, but was not always successful. Here are
the ones I can remember:
1: bad UTF-8 in darcs commit messages. We have one early commit,
from around 2007, that has a spurious \xc2 in a checkin comment.
(I suspect that modern versions of darcs will prevent this, but
back then, maybe not). darcs-fast-export uses 'darcs changes
--xml' to find out what work needs to be done, and of course runs
an XML parser on the output. The stray \xc2 caused the parser to
fail. My workaround was to replace the \xc2 with "X" before
passing it to the XML parser: ugly but functional. If I were
re-running the conversion now, I'd make the replacement be much
more specific, so that it would only ever spot the \xc2 in that
particular comment, and just delete the \xc2 completely.
2: darcs weirdness while applying patches: darcs-fast-export appears
to work by reading patch metadata out of the XML output, using
that to locate a file in _darcs/patches/PATCHID.gz, combining the
metadata with the patch.gz to construct a stream that is fed into
"darcs apply" on an internal darcs repo, and walking the
newly-modified darcs tree to locate files that have changed, then
putting those files and their contents into the fast-export
command stream. (actually, I think it puts files/contents of
*all* files into that stream, rather than trying to figure out a
minimal subset that have changed). I had at least one instance
where the 'darcs apply' failed because of uncommitted changes
left over by a previous apply, probably when conflicts or their
resolving patches left the tree in a weird state. I believe I had
to do a 'darcs repair' on the internal repo to get it to
continue.
3: not removing all deleted files in the darcs->git direction. The
original darcs-fast-export script behaved correctly for full
conversions, but failed to emit all the necessary DELETE()
commands when doing a partial/incremental conversion. (The
implementation decision allows full conversions to run faster, so
I suspect that incremental operation was added later and missed a
few cases). I changed the script to emit a full DELETEALL()
command and then re-add every file back in.
4: not handling multiple git patches on the git->darcs direction: my
memory is fuzzy, but I think the problem I observed was when I'd
made several git commits at once, then pushed them all at the
same time. The tool would convert them all over to darcs
correctly (creating multiple darcs patches), but then would fail
to update something (the internal darcs repo?) with the non-final
patch, such that next time I ran it, the tree was in a weird
state. Maybe the git->darcs direction involves pushing patches
directly to the target tahoe.darcs repo, and doesn't necessarily
update the internal darcs repo used for the darcs->git direction.
My workaround was to run a double-checking script that looked for
unpushed patches (just a 'darcs push' on each repo). What I
always type is:
make pull
make push
make check-everything
and if the 'darcs push'es inside the check-everything target tell
me that something needs to be pushed, I say yes. This only seems
to happen when I try to git->darcs multiple git revisions at the
same time, so I've kind of learned to not do that.
Since the bridge is bidirectional, I'm worried about collisions and
how to resolve them, so I always run it manually. Back in 2006/2007
when we were migrating an earlier project from SVN to Darcs, we had
an automatic bidirectional svn<->darcs bridge running, and every
couple of months the whole thing would explode messily and we'd have
to drag Zooko in for several days of cleanup and debugging (during
which we couldn't touch the tree). One A frequent source of ignition
was committing a change to one side of the bridge at the same time as
someone else was committing a different change to the other side. Or,
when something in the bridge failed (a post-commit hook couldn't
reach the buildbot, or something), leaving the bridge in a slightly
bad state, such that the next commit would really mess things up
(visualize a multi-car pileup on a freeway, except with patches
instead of cars). Very finicky.
So I only run this new git<->darcs bridge manually, where I can watch
the results and make sure everything went smoothly. The commit rate
on Tahoe is pretty slow right now, so collisions aren't very likely,
but I'm worried enough about them that I don't want to automate
anything. The resulting workflow is a drag.
Incidentally, I remember giving Tailor a try when I first started
this project, but I gave up after a few days.. I no longer recall
why. I think the fast-import stream seemed like a better idea to me,
and darcs-fast-export looked to be more mature than the
incremental-conversion support in tailor. I do remember tailor being
fairly hard to set up, whereas darcs-fast-export (before I delved
into bugfixing) was pretty easy.
hope that helps,
-Brian
|