-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor NgramTokenizer #4031
Labels
Comments
if you can provide an example i can help with the rest |
If we decide to replace the dependency, this would be about 5 lines of code: https://pytorch.org/text/stable/_modules/torchtext/data/utils.html#ngrams_iterator torchtext is used here: |
can we just copy the code over? |
yeah that would probably be the solution for this tokenizer. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The NgramTokenizer is using torchtext. We want to remove torchtext as a dependency so this Tokenizer has to be refactored not using it.
The text was updated successfully, but these errors were encountered: