darcs

Issue 2602 add command 'darcs config'

Title add command 'darcs config'
Priority feature Status unknown
Milestone Resolved in
Superseder Nosy List bfrk
Assigned To
Topics

Created on 2018-10-01.07:07:06 by bfrk, last changed 2019-06-23.17:57:54 by bfrk.

Messages
msg20346 (view) Author: bfrk Date: 2018-10-01.07:07:03
We should unify the several ways of configuring darcs with a single 
command 'darcs config'. This should work on the current repo unless we 
pass --global, which means edit the global (user) configuration. Standard 
action is to add a setting, with option --remove to instead remove it.

Syntax:

darcs config COMMAND OPTION
darcs config defaultrepo URL
darcs config email EMAIL -- badly named, better idea anyone?
darcs config author EMAIL
... any more? E.g. sources

and more generally

darcs config NAME VALUE

to replace environment variables (see below for details)

See also my comment on issue1898, part (b).

In principle, I would like to store configuration in a unified file 
format, but that would cause lots of compatibility problems. With 'darcs 
config' we could at least hide the several file formats that we currently 
use from the casual user.

Lastly, there are *way too many* environment variables that influence how 
darcs operates. Environment variables are a notoriously bad way to 
configure a user program. An exception to this rule are generic 
environment variables that the user can set to influence the behavior of 
many programs at once. This includes HOME, APPDATA, XDG_CACHE_HOME, 
EDITOR, PAGER, TMPDIR, EMAIL, and the various XYZ_PROXY variables. These 
should all be honored by default which is what we do now.

Everyting else, i.e mostly the variables that start with DARCS_ should be 
replaced into a new preference file format with a simple key=value 
syntax, or else scrapped because there is already a better way to 
configure them. In particular:

DARCS_EDITOR, DARCS_PAGER: preference, overrides standards env vars

DARCS_DONT_COLOR, DARCS_ALWAYS_COLOR, DARCS_ALTERNATIVE_COLOR, 
DARCS_DO_COLOR_LINES: ditto, but try to refactor

DARCS_DONT_ESCAPE_EXTRA,
DARCS_ESCAPE_EXTRA: i don't think anyone wants to configure the behavior 
at this fine level of granularity so I'd say remove them

DARCS_DONT_ESCAPE_TRAILING_CR: this should be decided depending on the OS 
we run on

DARCS_DONT_ESCAPE_ISPRINT: obsolete, scrap

DARCS_DONT_ESCAPE_TRAILING_SPACES,
DARCS_DONT_ESCAPE_ANYTHING,
DARCS_ESCAPE_8BIT: need to decide what to do about these, perhaps 
environment vars make a certain sense here for use in shell scripts

DARCS_TMPDIR: preference, overrides standard env var

DARCS_KEEP_TMPDIR: preference

DARCS_EMAIL: scrap, superseeded by global authors file
SENDMAIL: scrap, superseeded by defaults (i.e. use --sendmail-command 
CMD, we can give arguments to defaults since at least 2.12)

DARCS_SLOPPY_LOCKS: check if this is still needed

DARCS_SSH, DARCS_SCP, DARCS_SFTP: preferences

DARCS_PROXYUSERPWD: putting a password in plain text into an env var is a 
very bad idea IMO, should be removed

DARCS_CONNECTION_TIMEOUT: preferences (currently used only if built with 
-fcurl)
msg20851 (view) Author: bfrk Date: 2019-06-17.14:42:55
It has been a while since I opened this issue. Guess I'll have to send 
some code to get a reaction...
msg20852 (view) Author: ganesh Date: 2019-06-17.21:50:53
I'm happy with the description of darcs config. I'm also all in
favour of removing env vars. Shouldn't we make them into options
though (that could be given defaults in prefs files)?
msg20853 (view) Author: bfrk Date: 2019-06-18.08:54:27
The idea of using options as the single uniform mechanism for all 
configuration does have a certain charm to it. It is not without problems, 
though.

As the number of options increases, especially those that are uniformly 
valid for all commands, this has the unfortunate tendency to clutter up 
the UI. Take for instance pre- and posthooks: I cannot think of a 
situation where I would want to specify them on the command line. Still 
their descriptions take up a lot of space in the help output (8 lines!) as 
well as when the shell displays options during command completion. If we 
add even more options like that to replace environment variables, we 
really need to do something about that.

The obvious solution is to hide such options in the help output (at least 
by default) and the command completion (i.e. --list-options). We currently 
distinguish between "basic" and "advanced" options; if we add another axis 
for "command specific" vs. "generic" options, then options that are 
"advanced" and "generic" could be hidden by default. This should include: 
all hook options, --no-cache, --timings, --debug, --debug-http, --list-
options, --disable; it would probably include most of the options we might 
add to replace env vars.

(We could do that independent of whether we add more such options to 
replace env vars. Implementing this may not be as easy, though.)

Then there is the technical problem of how to pass such options down to 
where they are needed. We can read env vars in any IO action but options 
are being explicitly passed. I would rather not add more global variables 
if it can be helped, but I guess we'd have to bite that bullet. Some 
people nowadays recommend to use a 'ReaderT Config IO' and I would use 
that pattern if I had to re-write Darcs from scratch. I have a hard time 
seeing us do this as a refactor, though.

                         * * * * * * *

Now for a completely different idea. Let us re-examine /why/ environment 
variables are a bad choice for configuring a tool like Darcs:

(1) They cannot be specified per repository, only per user.
(2) There is no uniform and predictable way to set them: it depends
    on the shell you are using, settings tend to become strewn over
    many different files, they may be set conditionally, etc etc.
(3) AFAIR environment variables are quite awkward to use on Windows.

On the plus side, environment variables can be set by the admin to a 
default value for all users, which can be quite useful in certain 
situations.

We can solve all three problems with one stroke by adding two more 
configuration files for Darcs: _darcs/prefs/environment and 
~/.darcs/environment. (The file name is up for grabs, could as well name 
it "settings" or "config", whatever.) These files contain simple 'KEY = 
VALUE' definitions. On startup we read these files (in order from per-
user, then per-repo), and for each entry we /add/ a corresponding 
definition to our environment. (It just occurred to me that we may also 
need an 'unset KEY' command to remove an existing environment variable / 
setting.) We could do a bit of translation for keys to make the file 
format less awkward e.g. add the DARCS_ prefix or turn lower to upper 
case; details can be worked out.

Besides solving the above problems, I see the following advantages:

* fully compatible with existing configurations and darcs versions
* doesn't complicate the UI any further
* with 'unset key' there is no need to awkwardly add negated versions
  of every setting (as we currently need for options)
* requires no deep refactorings in the source code, all settings are
  readily available to every IO action
* easier to add new settings, compared to adding a new command line
  option

If we agree on this proposal I can even imagine gradually phasing out some 
of the more obscure options and replacing them with such "environment" 
variables / settings. We may also want to add support for reading an 
additional per-system configuration file.
msg20859 (view) Author: ganesh Date: 2019-06-18.20:08:58
On 18/06/2019 10:54, Ben Franksen wrote:
> 
> The obvious solution is to hide such options in the help output (at least 
> by default) and the command completion (i.e. --list-options). We currently 
> distinguish between "basic" and "advanced" options; if we add another axis 
> for "command specific" vs. "generic" options, then options that are 
> "advanced" and "generic" could be hidden by default. This should include: 
> all hook options, --no-cache, --timings, --debug, --debug-http, --list-
> options, --disable; it would probably include most of the options we might 
> add to replace env vars.

I agree with this. (I guess there ought to be some signpost to them
somewhere in the help.)

> Then there is the technical problem of how to pass such options down to 
> where they are needed. We can read env vars in any IO action but options 
> are being explicitly passed. I would rather not add more global variables 
> if it can be helped, but I guess we'd have to bite that bullet. Some 
> people nowadays recommend to use a 'ReaderT Config IO' and I would use 
> that pattern if I had to re-write Darcs from scratch. I have a hard time 
> seeing us do this as a refactor, though.

I'm not keen on ReaderT Config IO because we'll be passing the whole
config to places where only one value might be needed. My personal
preference would be implicit parameters, but I know they aren't
generally popular.


> Now for a completely different idea. Let us re-examine /why/ environment 
> variables are a bad choice for configuring a tool like Darcs:
> 
> (1) They cannot be specified per repository, only per user.
> (2) There is no uniform and predictable way to set them: it depends
>     on the shell you are using, settings tend to become strewn over
>     many different files, they may be set conditionally, etc etc.
> (3) AFAIR environment variables are quite awkward to use on Windows.

For me the biggest problem with environment variables is that they are
really problematic when using darcs as a library. The calling code
doesn't know about them and even it it did it's not exactly a pleasant
API to have to set an environment variable before calling a function.

FWIW, they're not actually particularly hard to use on Windows. The UI
for setting them permanently is a bit unpleasant but there's a 'setx'
shell command that helps a bit.

> We can solve all three problems with one stroke by adding two more 
> configuration files for Darcs: _darcs/prefs/environment and 
> ~/.darcs/environment. (The file name is up for grabs, could as well name 
> it "settings" or "config", whatever.) These files contain simple 'KEY = 
> VALUE' definitions. On startup we read these files (in order from per-
> user, then per-repo), and for each entry we /add/ a corresponding 
> definition to our environment. (It just occurred to me that we may also 
> need an 'unset KEY' command to remove an existing environment variable / 
> setting.) We could do a bit of translation for keys to make the file 
> format less awkward e.g. add the DARCS_ prefix or turn lower to upper 
> case; details can be worked out.

This makes it hard to set them on a per-invocation basis. I barely use
them anyway, but when I do it's by doing something like

  DARCS_SOMETHING=yes darcs ...

I may be atypical because I'd normally just be experimenting for
development purposes rather than actually having decided I need them
permanently.

If we would still be using the environment (or other global state) to
pass them around inside darcs then my complaint about the library still
applies.
msg20862 (view) Author: bfrk Date: 2019-06-19.10:35:44
>> The obvious solution is to hide such options in the help output [...]
> 
> I agree with this. (I guess there ought to be some signpost to them
> somewhere in the help.)

Of course.

>> Then there is the technical problem of how to pass such options down to 
>> where they are needed. [...]
> 
> I'm not keen on ReaderT Config IO because we'll be passing the whole
> config to places where only one value might be needed
In https://www.fpcomplete.com/blog/2017/06/readert-design-pattern
under the heading "Has typeclass approach" he describes how to use type
classes to avoid this. He also describes how we can let TH generate all
the boilerplate using the lens library (I checked that the example works
equally well with microlens).

The way HasX constraints propagate upward from the point of use is
actually pretty similar to implicit parameter constraints. The main
difference is that with ReaderT we are forced to use monadic style,
whereas with implicit parameters we aren't. But if we want to replace
env vars we already are in the IO monad, so that wouldn't make a
difference for us.

> My personal
> preference would be implicit parameters, but I know they aren't
> generally popular.

I did some research to find out why they aren't popular. It seems that
there are some good reasons. This discussion:

https://www.reddit.com/r/haskell/comments/6gz4w5/whats_wrong_with_implicitparams/

is an interesting read with some quite scary examples. Most
disturbingly, the meaning of a local definition can change depending on
whether it has a type signature or not. This is also mentioned in the
ghc manual here:

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#implicit-parameters-and-polymorphic-recursion

One reason I tend to like the ReaderT approach more is precisely
/because/ it enforces monadic style. Allowing dynamically scoped
variables in arbitrary (pure) code scares me a lot because they make
reasoning (in particular: refactoring) much more difficult.

On the practical side, I have read that you are not allowed to use
implicit parameter constraints as superclass constraints for classes or
instances (because that introduces incoherence). This is what E. Kmett
says here:

https://www.reddit.com/r/haskell/comments/5xqozf/implicit_parameters_vs_reflection/

I may be wrong but i think we would need to do exactly that in order to
remove the environment variables that configure the behavior of our
Printer type, since that functionality is used by generic code under
Darcs.Patch.

BTW, the reflection library is something I haven't been able to wrap my
head around. I look at the API and I can't see how to use it to achieve
what it claims. If you happen to understand it, I would like to see the
ReaderT example from the link above transscribed to use reflection instead.

> For me the biggest problem with environment variables is that they are
> really problematic when using darcs as a library. The calling code
> doesn't know about them and even it it did it's not exactly a pleasant
> API to have to set an environment variable before calling a function.

Good point. Global (environment or in-program) variables are always a
bad choice for libraries.

(BTW, this also applies to the current working directory which you have
to initialize properly for most procedures from Darcs.Repository and
Darcs.UI.Commands; unless we completely remove the assumption of "."
being the current repository from them.)

>> We can solve all three problems with one stroke by adding two more 
>> configuration files for Darcs: _darcs/prefs/environment and 
>> ~/.darcs/environment. [...]
> 
> This makes it hard to set them on a per-invocation basis. I barely use
> them anyway, but when I do it's by doing something like
> 
>   DARCS_SOMETHING=yes darcs ...
> 
> I may be atypical because I'd normally just be experimenting for
> development purposes rather than actually having decided I need them
> permanently.

True. I didn't think of that either.

(BTW, I use that idiom, too, though for a different purpose. I have
DARCS_ALWAYS_COLOR=1 by default because I like to pipe things through
less (directly or let darcs do it) which understands color codes. So I
need to disable that when redirecting the output to a file...)

To circumvent this problem, we could add /just one/ additional option
--config "VAR=value" that can be given multiple times to override any of
the settings on a per-invocation basis.

Your other objections still stand, of course.

Cheers
Ben
msg20869 (view) Author: ganesh Date: 2019-06-23.10:09:03
On 19/06/2019 11:35, Ben Franksen wrote:

>> I'm not keen on ReaderT Config IO because we'll be passing the whole
>> config to places where only one value might be needed
> In https://www.fpcomplete.com/blog/2017/06/readert-design-pattern
> under the heading "Has typeclass approach" he describes how to use type
> classes to avoid this. He also describes how we can let TH generate all
> the boilerplate using the lens library (I checked that the example works
> equally well with microlens).

I think the Has typeclass is a good approach. Less keen on TH unless we
have other compelling needs for it in darcs as well, it tends to make
everything more complicated. For example binding group analysis is
messed up by top-level splices which in my experience can lead to a lot
of head-scratching, and it probably makes things less portable.

(For more detail on binding group analysis, see the comment "Top-level
declaration splices break up a source file" in
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#template-haskell
)

>> My personal
>> preference would be implicit parameters, but I know they aren't
>> generally popular.
> 
> I did some research to find out why they aren't popular. It seems that
> there are some good reasons. This discussion:
> 
> https://www.reddit.com/r/haskell/comments/6gz4w5/whats_wrong_with_implicitparams/
> 
> is an interesting read with some quite scary examples. Most
> disturbingly, the meaning of a local definition can change depending on
> whether it has a type signature or not. This is also mentioned in the
> ghc manual here:
> 
> https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#implicit-parameters-and-polymorphic-recursion

I'm 'hsenag' in that reddit thread. As I say there, I like using them in
cases where no recursive use is contemplated. I think probably even
expecting to need to rebind one (non-recursively) would be an indicator
not to use them.

> One reason I tend to like the ReaderT approach more is precisely
> /because/ it enforces monadic style. Allowing dynamically scoped
> variables in arbitrary (pure) code scares me a lot because they make
> reasoning (in particular: refactoring) much more difficult.

TBH I don't see the difference. The Reader monad is just as dynamically
scoped as implicit params: if you rebind the reader value you can get
into just as much of a pickle, I think.

> On the practical side, I have read that you are not allowed to use
> implicit parameter constraints as superclass constraints for classes or
> instances (because that introduces incoherence). This is what E. Kmett
> says here:
> 
> https://www.reddit.com/r/haskell/comments/5xqozf/implicit_parameters_vs_reflection/
> 
> I may be wrong but i think we would need to do exactly that in order to
> remove the environment variables that configure the behavior of our
> Printer type, since that functionality is used by generic code under
> Darcs.Patch.

Well, we'd need to put them in the type signatures of the class methods.
But the same is true of using ReaderT, or conversely if it's not true of
using ReaderT, then the same is true of implicit params. I suspect it
depends on whether we need the vars to *generate* a Doc or just to
output it.


> BTW, the reflection library is something I haven't been able to wrap my
> head around. I look at the API and I can't see how to use it to achieve
> what it claims. If you happen to understand it, I would like to see the
> ReaderT example from the link above transscribed to use reflection instead.

I've never fully understood the point of it, but it feels like another
way of doing something implicit-parameter-like.

>> For me the biggest problem with environment variables is that they are
>> really problematic when using darcs as a library. The calling code
>> doesn't know about them and even it it did it's not exactly a pleasant
>> API to have to set an environment variable before calling a function.
> 
> Good point. Global (environment or in-program) variables are always a
> bad choice for libraries.
> 
> (BTW, this also applies to the current working directory which you have
> to initialize properly for most procedures from Darcs.Repository and
> Darcs.UI.Commands; unless we completely remove the assumption of "."
> being the current repository from them.)

Agreed, and that's a far worse problem in that I run into it constantly
when trying to code against darcs-the-library. Environment variables on
the other hand are mostly a theoretical problem right now. Conventional
wisdom has been that it (using CWD) is needed for performance reasons,
but I don't know how true that is nowadays.

>>> We can solve all three problems with one stroke by adding two more 
>>> configuration files for Darcs: _darcs/prefs/environment and 
>>> ~/.darcs/environment. [...]
>>
>> This makes it hard to set them on a per-invocation basis. I barely use
>> them anyway, but when I do it's by doing something like
>>
>>   DARCS_SOMETHING=yes darcs ...
>>
>> I may be atypical because I'd normally just be experimenting for
>> development purposes rather than actually having decided I need them
>> permanently.
> 
> True. I didn't think of that either.
> 
> (BTW, I use that idiom, too, though for a different purpose. I have
> DARCS_ALWAYS_COLOR=1 by default because I like to pipe things through
> less (directly or let darcs do it) which understands color codes. So I
> need to disable that when redirecting the output to a file...)
> 
> To circumvent this problem, we could add /just one/ additional option
> --config "VAR=value" that can be given multiple times to override any of
> the settings on a per-invocation basis.

That sounds fine. My main argument for using the existing flags
mechanism is avoiding having multiple overlapping mechanisms. But if we
can clearly differentiate the two kinds of things and the
overhead/boilerplate of adding flags is high, there's a strong argument
for a new mechanism.
msg20870 (view) Author: bfrk Date: 2019-06-23.17:57:51
>> In https://www.fpcomplete.com/blog/2017/06/readert-design-pattern
>> under the heading "Has typeclass approach" he describes how to use type
>> classes to avoid this. He also describes how we can let TH generate all
>> the boilerplate using the lens library (I checked that the example works
>> equally well with microlens).
> 
> I think the Has typeclass is a good approach. Less keen on TH unless we
> have other compelling needs for it in darcs as well, it tends to make
> everything more complicated.

Lenses without TH mean /lots/ of boilerplate, though.

> For example binding group analysis is
> messed up by top-level splices which in my experience can lead to a lot
> of head-scratching, and it probably makes things less portable.
> 
> (For more detail on binding group analysis, see the comment "Top-level
> declaration splices break up a source file" in
> https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#template-haskell
> )

Ha! I've been stumbling over that problem a few times but didn't
investigate. I just found that moving all splices in a file together in
one place fixed it. Now I know why.

Yes, that's kinda ugly. But compared to all the boilerplate... we're
talking maybe several dozens of option values... plus any number of
configuration variables that we currently use the environment for.

>>> [ about implicit parameters]
>> I did some research to find out why they aren't popular. [...]
>> the meaning of a local definition can change depending on
>> whether it has a type signature or not.
> [...] I like using them in
> cases where no recursive use is contemplated. I think probably even
> expecting to need to rebind one (non-recursively) would be an indicator
> not to use them.

If this is the only problematic scenario, perhaps we could live with that.

>> One reason I tend to like the ReaderT approach more is precisely
>> /because/ it enforces monadic style. Allowing dynamically scoped
>> variables in arbitrary (pure) code scares me a lot because they make
>> reasoning (in particular: refactoring) much more difficult.
> 
> TBH I don't see the difference. The Reader monad is just as dynamically
> scoped as implicit params: if you rebind the reader value you can get
> into just as much of a pickle, I think.

Yes, clearly. But with monadic code I am used to being careful about
effects and their order. When refactoring pure code I can be much more
ruthless and I don't want to loose that guarantee, i.e. the ability to
pick things apart and put them together in different ways, not having to
worry about effects and ordering.

>> On the practical side, I have read that you are not allowed to use
>> implicit parameter constraints as superclass constraints for classes or
>> instances (because that introduces incoherence). This is what E. Kmett
>> says here:
>>
>> https://www.reddit.com/r/haskell/comments/5xqozf/implicit_parameters_vs_reflection/
>>
>> I may be wrong but i think we would need to do exactly that in order to
>> remove the environment variables that configure the behavior of our
>> Printer type, since that functionality is used by generic code under
>> Darcs.Patch.
> 
> Well, we'd need to put them in the type signatures of the class methods.

Good point. So this restriction doesn't bite as hard as I thought.

> But the same is true of using ReaderT

True.

>, or conversely if it's not true of
> using ReaderT, then the same is true of implicit params. I suspect it
> depends on whether we need the vars to *generate* a Doc or just to
> output it.

Just to output it. More precisely, to turn a Doc into a String or
ByteString that can then be written to some handle. And it only concerns
the interactive output; for the on-disk representation we always use the
plain printer that cannot be configured.

The env vars are all read inside getPolicy which is only called by
fancyPrinters, which in turn gets passed directly to the output
routines. This is so because getPolicy needs the handle to decide
whether to use colors and escaping by default: if it's a terminal device
than yes, else no.

So the constraint would not infect things like showPatch. We'd have to
see how to handle debug and error messages, though.

>>> For me the biggest problem with environment variables is that they are
>>> really problematic when using darcs as a library. [...]>>
>> (BTW, this also applies to the current working directory which you have
>> to initialize properly for most procedures from Darcs.Repository and
>> Darcs.UI.Commands; unless we completely remove the assumption of "."
>> being the current repository from them.)
> 
> Agreed, and that's a far worse problem in that I run into it constantly
> when trying to code against darcs-the-library. Environment variables on
> the other hand are mostly a theoretical problem right now. Conventional
> wisdom has been that it (using CWD) is needed for performance reasons,
> but I don't know how true that is nowadays.

While this is indeed a problem, it is only tangential to configuration,
so I'd say we should discuss this one separately.

> My main argument for using the existing flags
> mechanism is avoiding having multiple overlapping mechanisms. But if we
> can clearly differentiate the two kinds of things and the
> overhead/boilerplate of adding flags is high, there's a strong argument
> for a new mechanism.

I remain undecided about this. And since I have lots of trouble ATM with
more dire problems like failing QC tests even for V3 I will postpone
work on this issue. Still, thanks for your input, if and when I take
this up again I will surely come back to it.
History
Date User Action Args
2018-10-01 07:07:06bfrkcreate
2019-06-17 14:42:57bfrksetmessages: + msg20851
2019-06-17 21:50:54ganeshsetmessages: + msg20852
2019-06-18 08:54:30bfrksetmessages: + msg20853
2019-06-18 08:59:43bfrksettitle: add command 'darcs config' -> add command 'darcs config' / re-think how to configure darcs
2019-06-18 20:09:01ganeshsetmessages: + msg20859
title: add command 'darcs config' / re-think how to configure darcs -> add command 'darcs config'
2019-06-19 10:35:47bfrksetmessages: + msg20862
2019-06-23 10:09:07ganeshsetmessages: + msg20869
2019-06-23 17:57:54bfrksetmessages: + msg20870