Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spellchecker runner and initial generated dictionary (ru) #2720

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

razum2um
Copy link
Contributor

@razum2um razum2um commented Oct 31, 2021

I suggest using yaspeller to check md content.

  • russian maintainers supposed to use it locally => Rakefile changed
  • and need to sync exceptions list (dictionary, array or regexps) => new json file added

Usage:

npm i -g yaspeller
rake 'check:spelling[ru]'

Default output looks like this:

www.ruby-lang.org/ru/news/_posts/2013-02-24-ruby-2-0-0-p0-is-released.md 170 ms
-----
Typos: 2
1. эксперемент (73:32, suggest: эксперимент)
2. отправлии (171:1, suggest: отправили)

www.ruby-lang.org/ru/news/_posts/2013-12-21-ruby-version-policy-changes-with-2-1-0.md 191 ms
-----
Typos: 2
1. yдаление (42:3, en: y*******, ru: *даление, suggest: удаление)
2. oбратно (43:3, en: o******, ru: *братно, suggest: обратно, обратной)
-----

All those errors fixed in related PR: #2719

Note that it suggests replacements and finds locale-mismatched symbols as well.

Update dictionary from scratch

Use this repo

rm lib/spelling/ru/dictionary.json
rake 'check:spelling[ru,json]' # generates ./yaspeller_report.json which we git-ignore
cd ../yaspeller-dictionary-builder
python src/dictionary.py ../www.ruby-lang.org/yaspeller_report.json > ../www.ruby-lang.org/lib/spelling/ru/dictionary.json

Limitations:

  • Unfortunately, it supports only ru, ukrainian, and en. Consider adding it for en?
  • As the cli utility is based on free api and has rate limits, I don't put it into CI / standard linter flow. Maybe a git hook makes sense if a commit contains changes under ru

@razum2um razum2um requested review from a team as code owners October 31, 2021 19:21
@razum2um razum2um changed the title Add spellchecker runner and initial generated dictionary Add spellchecker runner and initial generated dictionary (ru) Oct 31, 2021
@lex111
Copy link
Member

lex111 commented Oct 31, 2021

Thanks for the suggestion, but I would prefer to add this check to CI (GitHub actions) as a separate workflow.

As the cli utility is based on free api and has rate limits, I don't put it into CI

Actually it is not big deal, I use yaspeller on many projects and never experienced with its rate limits, moreover it is possible to make yaspeller run only when files related with Russian translation has been changed.

@Nakilon
Copy link
Contributor

Nakilon commented Nov 13, 2021

Personally I used hunspell like this:

hunspell -d dict-en-20210701/en_US,dict_ru_ru-aot-0.4.5/russian-aot -p my.txt -l ru/news/_posts/202* | sort | uniq

Their README told to download the dictionaries in .oxt format, rename to .zip and unpack.
And the my.txt in the example is a text file with skipped words, like:

Alexandr
CVE
Gemfile
RDoc
Savca
aycabta
bundler
lang
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants