darcs

Issue 1094 typo blacklist for patch names and descriptions

Title typo blacklist for patch names and descriptions
Priority feature Status wont-fix
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, thorkilnaur, twb
Assigned To twb
Topics

Created on 2008-09-23.03:28:02 by twb, last changed 2010-03-30.13:32:27 by kowey.

Files
File name Uploaded Type Edit Remove
Spelling.pm twb, 2008-11-06.05:21:12 text/x-perl
blacklist_based-spell_checking-module_-first-attempt_.dpatch twb, 2008-11-06.07:33:49 text/x-darcs-patch
unnamed twb, 2008-11-06.07:33:49 text/plain
Messages
msg6091 (view) Author: twb Date: 2008-09-23.03:28:00
Darcs' powerful cherry-picking features make it important to avoid
typos in patch names.  For example, if a user accidentally did

    darcs rec -am 'debain: bump changelog to 2.0.3~pre1-1"

then later doing

    darcs send -p ^debian:

would not match this patch.  For inattentive users like myself, this
can actually result in data loss -- because I *think* all the
important patches have been sent/pushed, and I rm -rf my local copy.

There are two ways to do spellchecking:

- The checker has a whitelist of "correct" words; if its not in this
  list, it's a mistake.  This results in lots of false positives.

- The checker has a blacklist of common typos; if its not in this
  list, it's correct.  This results in lots of false negative.

On the mailing list the general consensus (which I agree with) is that
the former alternative is a lot of work and will annoy a lot of people
(particularly when you take into account multilingual environments).

I'd like to reiterate my argument for the latter alternative, because
the fallback behaviour -- false negatives -- is what darcs does now,
without any spellchecking!

Here's a mockup of how this feature might behave.

    $ cat _darcs/typos
    typo tyop
    # Enforce en_GB-oed (Oxford English) conventions.
    colour color
    realize realise
    $ cat ~/.darcs/typos
    # Enforce capitalization as well as correct spelling.
    Debian debian debain
    # Unicode is fun, too.
    한국어 혼극어

This is a table of correctly spelled words, followed by one or more
incorrect versions on the same line.  Darcs init has created the file
with the "tyop" line, and the end user has gotten annoyed with his
co-workers and added some extra entries.  The user can also have a
personal typo blacklist.

    $ darcs rec -am "Fix colour blindness issue."

No change here, because there are no blacklisted words.

    $ darcs rec -a
    What is the patch name? Fix color blindness issue.

    Warning: possible typographical error (typo) detected!
    You said `color' but probably meant `colour'.

    Change patch name? [Y/n/q]

The user can choose N to override this warning, Y to go back to the
patch name step, or q to abort the entire record operation.

    $ darcs rec -am 'Fix color blindness issue.'
    Warning: possible typographical error (typo) detected!
    You said `color' but probably meant `colour'.
    Continuing anyway.

Because used in "non-interactively", this just prints a warning and
continues on anyway.

Note that I haven't shown it here, but I envisage this feature also
checking the patch description against the typo blacklist.  It would
*NOT* check the patch body (i.e. the hunks of changed lines and
suchlike).

We can probably harvest the output of "darcs changes -s" on some large
repositories like darcs and ghc to create a useful default blacklist.

PS: this feature was inspired by lintian, which does the same thing
for debian changelog entries.
msg6113 (view) Author: droundy Date: 2008-09-23.22:08:45
Sounds like a nice idea.  I'm not sure how best to implement this sort of
configuration.  It's the sort of thing that setpref is designed for, but setpref
is poorly designed, so I'd rather avoid introducing new uses for it, if possible.

Perhaps the best way to start out would be to have the file hardcoded as
something like .darcs.prefs/spelling-corrections (and also search
~/.darcs/spelling-corrections).  This isn't very flexible, but does allow you to
either define your spelling corrections locally or to include them in the
repository.  The issue, of course, is that you might like to have new spelling
corrections pulled from one repository to another.

David
msg6125 (view) Author: twb Date: 2008-09-23.23:08:15
On Tue, Sep 23, 2008 at 10:08:46PM -0000, David Roundy wrote:
> The issue, of course, is that you might like to have new spelling
> corrections pulled from one repository to another.

Indeed, it's analogous to the ~/.darcs/boring, _darcs/prefs/boring and
set pref boringfile; I agree that supporting version control of the
blacklist file can be ignored until the actual blacklisting is
working.
msg6603 (view) Author: twb Date: 2008-11-06.05:21:12
The inspiration for this feature is a module from the tool "lintian",
which performs a large range of quality assurance tests on Debian
packages.  Attached is the module in question; it checks for a
hard-coded blacklist of English language typos.

I intend to start by translating this into a Haskell module, then
maybe seeing if it can be integrted into Darcs so that a warning is
issued at record/amend time if a typo is detected in the patch name or
patch description.

At this time, I have no intention of implementing user-extensible
blacklists, or of allowing the warning to be turned into an error (so
that misspelt patches aren't recorded, much as patches aren't recorded
if they fail --test).
Attachments
msg6604 (view) Author: twb Date: 2008-11-06.07:33:49
Thu Nov  6 18:33:34 EST 2008  Trent W. Buck <trentbuck@gmail.com>
  * Blacklist-based spell-checking module, first attempt.
Attachments
msg6605 (view) Author: dagit Date: 2008-11-06.08:13:06
On Wed, Nov 5, 2008 at 11:33 PM, Trent Buck <bugs@darcs.net> wrote:
>
> Trent Buck <trentbuck@gmail.com> added the comment:
>
> Thu Nov  6 18:33:34 EST 2008  Trent W. Buck <trentbuck@gmail.com>
>  * Blacklist-based spell-checking module, first attempt.

Seems fine, but I do hope that eventually this can get turned into a
per-repo pref just like boring/binary.  I read that you have no plans
for that, but I think it would be quite easy to do given the existing
infrastructure.

Anyway, seems like a reasonable start.

Thanks!
Jason
msg6606 (view) Author: twb Date: 2008-11-06.08:19:26
On Thu, Nov 06, 2008 at 12:12:55AM -0800, Jason Dagit wrote:
> [...] I do hope that eventually this can get turned into a per-repo
> pref just like boring/binary.  I read that you have no plans for
> that, but I think it would be quite easy [...]

No plans in the short term, anyway.  By that I meant that I'm focusing
on getting it working *at all*.
msg6607 (view) Author: kowey Date: 2008-11-06.10:54:24
On Thu, Nov 06, 2008 at 07:33:51 -0000, Trent Buck wrote:
> Thu Nov  6 18:33:34 EST 2008  Trent W. Buck <trentbuck@gmail.com>
>   * Blacklist-based spell-checking module, first attempt.

I'm assuming this is just a draft, and am going to let others comment
on this.

You know the drill, when we are ready to go.
msg10579 (view) Author: kowey Date: 2010-03-30.13:32:26
Hi Trent, I'd like to recommend we wont-fix this as part of our attempts
to consciously resist feature creep.

Perhaps this is something that can really be deferred to a record posthook?
History
Date User Action Args
2008-09-23 03:28:02twbcreate
2008-09-23 22:08:46droundysetpriority: feature
nosy: + droundy
status: unread -> unknown
messages: + msg6113
2008-09-23 23:08:17twbsetnosy: droundy, kowey, dagit, simon, twb
messages: + msg6125
2008-11-06 05:21:14twbsetfiles: + Spelling.pm
assignedto: twb
messages: + msg6603
nosy: + dmitry.kurochkin, thorkilnaur
2008-11-06 07:33:51twbsetfiles: + blacklist_based-spell_checking-module_-first-attempt_.dpatch, unnamed
nosy: droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin
messages: + msg6604
2008-11-06 08:13:08dagitsetnosy: droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin
messages: + msg6605
2008-11-06 08:19:28twbsetnosy: droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin
messages: + msg6606
2008-11-06 10:54:26koweysetnosy: droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin
messages: + msg6607
2008-11-06 11:58:37droundysetnosy: - droundy
2009-08-10 23:46:16adminsetnosy: - dagit
2009-08-20 02:48:54twbsetstatus: unknown -> has-patch
nosy: kowey, simon, twb, thorkilnaur, dmitry.kurochkin
2009-08-25 17:25:20adminsetnosy: + darcs-devel, - simon
2009-08-26 14:17:07koweysetnosy: kowey, darcs-devel, twb, thorkilnaur, dmitry.kurochkin
2009-08-27 14:28:24adminsetnosy: kowey, darcs-devel, twb, thorkilnaur, dmitry.kurochkin
2010-03-30 13:32:27koweysetstatus: has-patch -> wont-fix
messages: + msg10579