-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: inverted index does not support string longer than 65530 #37855
Comments
Is this referring to the length of the entire string or the length of a single word? |
it's a const variable MAX_TOKEN_LEN in tantivy, so I think it's a word. |
I will do some tests to check what's the influence for query using inverted index. |
/assign |
how can a token be that long? We do need to tune the max varchar length field. is there anything stop us from increasing varchar to 256K or 1M? |
/assign @sunby |
We tested inverted index with 65535 length string and this warning occured. |
I write an unit test to verify it. And strings longer than 65530 can not be searched because they are dropped in tantivy. |
We use "raw" tokenizer which means no tokenizer in tantivy. |
This seems to be a non blocker issue. Is there a blocking issue if we want to grow the size of varchar to 256k? like we use some smaller bits for a size |
Is there an existing issue for this?
Environment
Current Behavior
If you insert strings longer than 65530, milvus will not return warnings or errors but tantivy's log will print warning.
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: