add correct mecab installation instructions #132
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
I present this PR to prevent people from having issues like #54, #18.
Update: I also strongly suspect that issues such as #111 was caused by an incorrect configuration of MeCab(with the default encodings), which may cause an assertion in fastBPE.hpp to fail (line 480), therefore resulting in failiure to produce output files after fastBPE.
At least in my system locale, failing to set any one of these utf-8-enabling flags(see install_external_tools.sh) led to empty outputs in the embed task, encoding errors (at $LASER/source/lib/romanize_lc.py), and much confusion. Regrettably, it is quite hard to know this fact before you have this problem.
Also, I changed README.md a bit, so that hopefully mecab feels a bit more optional for people not dealing with the Japanese language.
Additional question: Why was the auto-installation of Mecab dropped?