Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo checker integration #268

Open
Peque opened this issue Mar 22, 2018 · 6 comments
Open

Typo checker integration #268

Peque opened this issue Mar 22, 2018 · 6 comments

Comments

@Peque
Copy link
Member

Peque commented Mar 22, 2018

Maybe we could integrate a typo-checker in the test suite. I know they exist, but have not look at it.

I have previously used checkpatch.pl in non-Python projects, which is used in the Linux kernel, but hopefully there is a nice Python package already out there... 😄

Due to the fact that I frequently make typos when writing in English, this might be important... And I will learn English too! 😂

@Peque Peque added this to the 0.7.0 milestone Mar 22, 2018
@ocaballeror
Copy link
Contributor

ocaballeror commented Mar 26, 2018

I've been looking into this for a while, but didn't find much.

The ideal thing would be a plugin for pytest that could do the job, but I didn't find any, which is kind of a bummer.

The closest thing I found to what I had in mind is scspell, which is run as an external tool, but it will take a bit of work, since we would need to create a personalized dictionary with all the specific words we use (osbrain, pytest, nameserver...).

I may revisit this in the future, but it's not really a priority right now.

@Peque Peque modified the milestones: 0.7.0, 0.8.0 Mar 26, 2018
@ocaballeror
Copy link
Contributor

ocaballeror commented Apr 4, 2018

I ran a manual check with scspell and went through the list of results, picking out manually which ones were actual typos. The results are in #285 .

There was no configuration involved, which means most of the results it reported were simply unknown words and weird variable names, which is the main reason why this would take a while to implement. I could have created a custom dictionary with the words that it should recognize as valid, but that will be hard to maintain in the future, and will probably create lots of commits that are simply named "updated dictionary".

Unless we find a way to store the dictionary file outside of the repo, this doesn't look very promising.

@Peque
Copy link
Member Author

Peque commented Apr 4, 2018

Yeah, I was thinking more about the approach in Linux's checkpatch.pl. They do not have a dictionary with all the valid English words, instead, they have a dictionary with common typos, so they only report an error when it is very likely an error.

In order to tokenize everything we could split text by any non-letter character (spaces, numeric, underscores...) and check against the common-typos dictionary. We could go further (thinking about class names) and split on camel-case words.

This would be awesome to have as a separate package, maybe integrated with flake8. I am currently busy with other projects in my spare time, but I might, at some point, spend some time with it. I do not think that would be before june though... 😂

@ocaballeror
Copy link
Contributor

Yeah, that would be awesome!

I'm still kind of surprised nobody has done anything like this before. I mean, somebody else has had to run into the same issue at some point.

@Peque
Copy link
Member Author

Peque commented Apr 5, 2018

@ocaballeror Literally everybody else. I know no project without a "Fix typo" commit in their history. 😂

@ocaballeror
Copy link
Contributor

https://github.com/search?q=fix+typo&type=Commits

I just looked it up and github reports 54,166,865 commit messages with the words "fix typo" 😆 😆

By the way, when doing that search I also stumbled upon this: https://github.com/intgr/topy
It looks better than scspell, since it works by recognizing common typos instead of using a standard dictionary.

I just tried it out and it looks very easy to integrate. Just run one command and it generates a patch for your projects with the typo corrections. You can even tell it to apply those corrections automatically if you want. The best thing is it only reported a couple of false positives, so it looks reliable enough in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants