We should be very careful when we extract URLs from text files that have
been copied from a repo on a different machine, or even different
account on the same machine, e.g. during a clone.
The problem is that text files are de- and encoded according to the
user's locale. If different developers work with different encodings
then a URL obtained from copying a text file from another repo may thus
become corrupted. An attacker with knowledge about the different
encodings and the URLs involved may try to use the URL corruption to
make subsequent darcs invocations access a malicious host, and
potentially even sneak in malicious patches.
I am aware of only one instance where we currently copy a text file that
contains repo URLs: the file _darcs/prefs/sources when cloning a repo.
The comments in the code (Darcs.Repository.Cache.unionRemoteRepos) say
that we never use caches from remote repos that are accessed via
network. But if the remote repo is a locally valid path, we /do/ copy
entries referring to repos accessed via network. Note that the attack
may work even when copying files from a different user on the same
machine, if that user has configured a different encoding.
Thankfully the sources file cannot be used to insert malicious patches
because we use the URLs in this file only as a cache for hashed files
which means they are resistant to tampering. But it may still cause
access to a malicious host when the user subsequently runs darcs
commands, which in turn may have more subtle security implications (like
exfiltration of secrets).
The worst case scenario, slipping in malicious patches, would only be
possible if we unwittingly pull from a malicious repo. For instance, if
we copied the _darcs/prefs/defaultrepo when cloning from a remote repo,
such as I originally planned to do for the --inherit-default option,
then the user may be tricked into pulling malicious patches into their
clone; which in due course may cause them to execute malicious code
(e.g. when running a test suite) with their full user rights.
One way to get rid of the problem could be to change access to such
files from text mode to binary mode and fix the encoding as UTF-8. This
comes at a cost, though: if a remote repo's location is not a valid
unicode string, then we can no longer use that as our defaultrepo, not
even by manually editing the file (we'll get a decoding error instead).