Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfBoundsException caused by short overflow in MeCabOovProvider when the OOV is very long #213

Closed
eiennohito opened this issue Jun 29, 2023 · 1 comment
Milestone

Comments

@eiennohito
Copy link
Collaborator

eiennohito commented Jun 29, 2023

When OOV is very long (>2^15 bytes), e.g. '�' * 10923 Sudachi analysis fails with exception caused by MeCabOovPlugin.

Solution: make max length of OOVs ~1024 chars.

@mh-northlander
Copy link
Collaborator

This particular problem will be solved by #233, as LatticeNodeImpl.makeOov sets begin/end directly instead of using oov length
stored in WordInfo as short value.
Index form length inside OOV wordInfo is still wrong, but it won't be used.

@mh-northlander mh-northlander added this to the 0.8 milestone Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants