Created on 2008-09-23.03:28:02 by twb, last changed 2010-03-30.13:32:27 by kowey.
msg6091 (view) |
Author: twb |
Date: 2008-09-23.03:28:00 |
|
Darcs' powerful cherry-picking features make it important to avoid
typos in patch names. For example, if a user accidentally did
darcs rec -am 'debain: bump changelog to 2.0.3~pre1-1"
then later doing
darcs send -p ^debian:
would not match this patch. For inattentive users like myself, this
can actually result in data loss -- because I *think* all the
important patches have been sent/pushed, and I rm -rf my local copy.
There are two ways to do spellchecking:
- The checker has a whitelist of "correct" words; if its not in this
list, it's a mistake. This results in lots of false positives.
- The checker has a blacklist of common typos; if its not in this
list, it's correct. This results in lots of false negative.
On the mailing list the general consensus (which I agree with) is that
the former alternative is a lot of work and will annoy a lot of people
(particularly when you take into account multilingual environments).
I'd like to reiterate my argument for the latter alternative, because
the fallback behaviour -- false negatives -- is what darcs does now,
without any spellchecking!
Here's a mockup of how this feature might behave.
$ cat _darcs/typos
typo tyop
# Enforce en_GB-oed (Oxford English) conventions.
colour color
realize realise
$ cat ~/.darcs/typos
# Enforce capitalization as well as correct spelling.
Debian debian debain
# Unicode is fun, too.
한국어 혼극어
This is a table of correctly spelled words, followed by one or more
incorrect versions on the same line. Darcs init has created the file
with the "tyop" line, and the end user has gotten annoyed with his
co-workers and added some extra entries. The user can also have a
personal typo blacklist.
$ darcs rec -am "Fix colour blindness issue."
No change here, because there are no blacklisted words.
$ darcs rec -a
What is the patch name? Fix color blindness issue.
Warning: possible typographical error (typo) detected!
You said `color' but probably meant `colour'.
Change patch name? [Y/n/q]
The user can choose N to override this warning, Y to go back to the
patch name step, or q to abort the entire record operation.
$ darcs rec -am 'Fix color blindness issue.'
Warning: possible typographical error (typo) detected!
You said `color' but probably meant `colour'.
Continuing anyway.
Because used in "non-interactively", this just prints a warning and
continues on anyway.
Note that I haven't shown it here, but I envisage this feature also
checking the patch description against the typo blacklist. It would
*NOT* check the patch body (i.e. the hunks of changed lines and
suchlike).
We can probably harvest the output of "darcs changes -s" on some large
repositories like darcs and ghc to create a useful default blacklist.
PS: this feature was inspired by lintian, which does the same thing
for debian changelog entries.
|
msg6113 (view) |
Author: droundy |
Date: 2008-09-23.22:08:45 |
|
Sounds like a nice idea. I'm not sure how best to implement this sort of
configuration. It's the sort of thing that setpref is designed for, but setpref
is poorly designed, so I'd rather avoid introducing new uses for it, if possible.
Perhaps the best way to start out would be to have the file hardcoded as
something like .darcs.prefs/spelling-corrections (and also search
~/.darcs/spelling-corrections). This isn't very flexible, but does allow you to
either define your spelling corrections locally or to include them in the
repository. The issue, of course, is that you might like to have new spelling
corrections pulled from one repository to another.
David
|
msg6125 (view) |
Author: twb |
Date: 2008-09-23.23:08:15 |
|
On Tue, Sep 23, 2008 at 10:08:46PM -0000, David Roundy wrote:
> The issue, of course, is that you might like to have new spelling
> corrections pulled from one repository to another.
Indeed, it's analogous to the ~/.darcs/boring, _darcs/prefs/boring and
set pref boringfile; I agree that supporting version control of the
blacklist file can be ignored until the actual blacklisting is
working.
|
msg6603 (view) |
Author: twb |
Date: 2008-11-06.05:21:12 |
|
The inspiration for this feature is a module from the tool "lintian",
which performs a large range of quality assurance tests on Debian
packages. Attached is the module in question; it checks for a
hard-coded blacklist of English language typos.
I intend to start by translating this into a Haskell module, then
maybe seeing if it can be integrted into Darcs so that a warning is
issued at record/amend time if a typo is detected in the patch name or
patch description.
At this time, I have no intention of implementing user-extensible
blacklists, or of allowing the warning to be turned into an error (so
that misspelt patches aren't recorded, much as patches aren't recorded
if they fail --test).
Attachments
|
msg6604 (view) |
Author: twb |
Date: 2008-11-06.07:33:49 |
|
Thu Nov 6 18:33:34 EST 2008 Trent W. Buck <trentbuck@gmail.com>
* Blacklist-based spell-checking module, first attempt.
Attachments
|
msg6605 (view) |
Author: dagit |
Date: 2008-11-06.08:13:06 |
|
On Wed, Nov 5, 2008 at 11:33 PM, Trent Buck <bugs@darcs.net> wrote:
>
> Trent Buck <trentbuck@gmail.com> added the comment:
>
> Thu Nov 6 18:33:34 EST 2008 Trent W. Buck <trentbuck@gmail.com>
> * Blacklist-based spell-checking module, first attempt.
Seems fine, but I do hope that eventually this can get turned into a
per-repo pref just like boring/binary. I read that you have no plans
for that, but I think it would be quite easy to do given the existing
infrastructure.
Anyway, seems like a reasonable start.
Thanks!
Jason
|
msg6606 (view) |
Author: twb |
Date: 2008-11-06.08:19:26 |
|
On Thu, Nov 06, 2008 at 12:12:55AM -0800, Jason Dagit wrote:
> [...] I do hope that eventually this can get turned into a per-repo
> pref just like boring/binary. I read that you have no plans for
> that, but I think it would be quite easy [...]
No plans in the short term, anyway. By that I meant that I'm focusing
on getting it working *at all*.
|
msg6607 (view) |
Author: kowey |
Date: 2008-11-06.10:54:24 |
|
On Thu, Nov 06, 2008 at 07:33:51 -0000, Trent Buck wrote:
> Thu Nov 6 18:33:34 EST 2008 Trent W. Buck <trentbuck@gmail.com>
> * Blacklist-based spell-checking module, first attempt.
I'm assuming this is just a draft, and am going to let others comment
on this.
You know the drill, when we are ready to go.
|
msg10579 (view) |
Author: kowey |
Date: 2010-03-30.13:32:26 |
|
Hi Trent, I'd like to recommend we wont-fix this as part of our attempts
to consciously resist feature creep.
Perhaps this is something that can really be deferred to a record posthook?
|
|
Date |
User |
Action |
Args |
2008-09-23 03:28:02 | twb | create | |
2008-09-23 22:08:46 | droundy | set | priority: feature nosy:
+ droundy status: unread -> unknown messages:
+ msg6113 |
2008-09-23 23:08:17 | twb | set | nosy:
droundy, kowey, dagit, simon, twb messages:
+ msg6125 |
2008-11-06 05:21:14 | twb | set | files:
+ Spelling.pm assignedto: twb messages:
+ msg6603 nosy:
+ dmitry.kurochkin, thorkilnaur |
2008-11-06 07:33:51 | twb | set | files:
+ blacklist_based-spell_checking-module_-first-attempt_.dpatch, unnamed nosy:
droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin messages:
+ msg6604 |
2008-11-06 08:13:08 | dagit | set | nosy:
droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin messages:
+ msg6605 |
2008-11-06 08:19:28 | twb | set | nosy:
droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin messages:
+ msg6606 |
2008-11-06 10:54:26 | kowey | set | nosy:
droundy, kowey, dagit, simon, twb, thorkilnaur, dmitry.kurochkin messages:
+ msg6607 |
2008-11-06 11:58:37 | droundy | set | nosy:
- droundy |
2009-08-10 23:46:16 | admin | set | nosy:
- dagit |
2009-08-20 02:48:54 | twb | set | status: unknown -> has-patch nosy:
kowey, simon, twb, thorkilnaur, dmitry.kurochkin |
2009-08-25 17:25:20 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-26 14:17:07 | kowey | set | nosy:
kowey, darcs-devel, twb, thorkilnaur, dmitry.kurochkin |
2009-08-27 14:28:24 | admin | set | nosy:
kowey, darcs-devel, twb, thorkilnaur, dmitry.kurochkin |
2010-03-30 13:32:27 | kowey | set | status: has-patch -> wont-fix messages:
+ msg10579 |
|