darcs

Issue 97 wish: more flexible regexps for darcs replace

Title wish: more flexible regexps for darcs replace
Priority wishlist Status given-up
Milestone Resolved in
Superseder Nosy List ar, darcs-devel, dmitry.kurochkin, kowey, thorkilnaur, tommy
Assigned To
Topics

Created on 2006-01-13.19:13:21 by ar, last changed 2017-07-30.23:23:18 by gh.

Messages
msg368 (view) Author: ar Date: 2006-01-13.19:14:15
Note: the description is actually in the file "replace.darcs".
msg2065 (view) Author: kowey Date: 2007-08-14.06:59:54
Pasting Albert's attachment in the bugtracker for convenience

----------------------------------------------------------------------

I would like to see more general regexps in `darcs replace`.  The
proposal below (slightly edited) was sent to the darcs-users mailing
list on Dec. 11, 2005, but went without feedback.  I still think that
it would be a great improvement (at least for me), hence this wishlist
item.

Excerpt of mail to darcs-users below:

================================================================

* The problem with the current scheme

In general, I like the idea of `darcs replace` a lot.  The problem I
seem to encounter every now and then is that of specifying the tokens
in a reasonable way: simply listing the allowed token characters
apparently works well for simple cases like (apparently) C or Haskell
or Fortran.

However, it is clearly not satisfactory for other types of text: e.g.,

- TeX, LaTeX etc., where a token might be, e.g., a backslash followed
  by characters from a given class; or where ``the na\"\i ve boy''
  should probably be seen as three tokens, not four;

- or Common Lisp (where any string can be used as a symbol name,
  though you might have to enclose it in |...|, but other characters
  cannot appear as the beginning of a symbol)

- or literate programs where, e.g., with noweb a token might be
  anything enclosed in << and >>, although single < or > characters in
  between are fine.


* Proposal for a solution:

All of the above examples could conveniently be handled if one were
able to use real regular expressions for the token, rather than for
every single character of the token.  As was pointed out on
darcs-users earlier, the problem is maintaining invertability, which
is needed for darcs.

However, I think this can be achieved by introducing two regular
expressions, RE1 and RE2, that can be used for locating the position
and extent of the replacement before and after the replacement,
respectively.

So, upon replacing OLD by NEW,

- RE1 is used to find OLD;

- tentatively, OLD is replaced by NEW;

- the result of the replacements is checked against the combination of
  RE2 and NEW, which must accurately pinpoint the replacements: all
  instances of NEW that came from OLD, but no spurious ones

- if so, accept the replacement; otherwise, reject it

The inverted patch is obtained by simply switching OLD <-> NEW, and
RE1 <-> RE2.

In most cases I would expect both REi to be the same, so that RE2
might default to RE1; and if --token-chars is given, RE1 is trivially
constructed out of that.

* Simple example

As an example, consider the noweb chunk identifier syntax:  With

    RE1 = RE2 = <<[^<>]+([<>][^<>]+)+>>

the replacement of "<<foo>>" with "<<bar baz>>" always succeeds and is
invertible; OTOH, a replacement with "bar baz" would not succeed
because of RE2.
msg2069 (view) Author: droundy Date: 2007-08-14.17:22:51
On Tue, Aug 14, 2007 at 06:59:54AM -0000, Eric Kow wrote:
> * Proposal for a solution:
> 
> All of the above examples could conveniently be handled if one were
> able to use real regular expressions for the token, rather than for
> every single character of the token.  As was pointed out on
> darcs-users earlier, the problem is maintaining invertability, which
> is needed for darcs.
> 
> However, I think this can be achieved by introducing two regular
> expressions, RE1 and RE2, that can be used for locating the position
> and extent of the replacement before and after the replacement,
> respectively.

I agree, this can work.  The catch (as far as easy implementation goes) is
that I'd be most comfortable with the actual regexp implementation going
into darcs itself, since repository corruption could result from two
regexp implementations that disagree on a match.  I know there are
reasonable regexp engines written in Haskell, so perhaps we could take one
of these and embed it in darcs.

I seem to recall that most existing regexp implementations are buggy in one
way or another, which is why I'm concerned.  And we don't even want
bugfixes going into our regexp engine once it's in use (that is to say, bug
fixes that affect regexp matching).

But apart from that challenge, this looks quite reasonable.  We'd use the
repo format functionality to ensure that older versions of darcs don't even
try to read repositories with new-style replace patches, and that newer
versions of darcs don't create such patches except in repositories for
which they're explicitly allowed.
-- 
David Roundy
Department of Physics
Oregon State University
History
Date User Action Args
2006-01-13 19:13:21arcreate
2006-01-13 19:14:15arsetstatus: unread -> unknown
nosy: + ar
messages: + msg368
2007-08-14 06:59:55koweysetnosy: + kowey, beschmi
messages: + msg2065
2007-08-14 07:00:06koweysetfiles: - replace.darcs
2007-08-14 17:22:53droundysetmessages: + msg2069
2008-02-09 05:39:36markstossetstatus: unknown -> deferred
nosy: droundy, tommy, beschmi, kowey, ar
title: more flexible regexps for REPLACE -> wish: more flexible regexps for REPLACE
2009-08-06 17:35:21adminsetnosy: + markstos, jast, Serware, dmitry.kurochkin, darcs-devel, zooko, dagit, mornfall, simon, thorkilnaur, - droundy, ar
2009-08-06 20:32:31adminsetnosy: - beschmi
2009-08-10 22:21:59adminsetnosy: + ar, - markstos, darcs-devel, zooko, jast, dagit, Serware, mornfall
2009-08-25 17:49:37adminsetnosy: + darcs-devel, - simon
2009-08-27 13:54:42adminsetnosy: tommy, kowey, darcs-devel, ar, thorkilnaur, dmitry.kurochkin
2009-08-27 15:02:23koweysetnosy: tommy, kowey, darcs-devel, ar, thorkilnaur, dmitry.kurochkin
2009-10-03 16:58:33koweysetnosy: tommy, kowey, darcs-devel, ar, thorkilnaur, dmitry.kurochkin
title: wish: more flexible regexps for REPLACE -> wish: more flexible regexps for darcs replace
2017-07-30 23:23:18ghsetstatus: deferred -> given-up