-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer Import Error When Using Ollama Models #766
Comments
2024-10-24 13:14:26,930 app.service.search_graph_service:66 -> (ERROR) Error in searching process: Could not import transformers python package. This is needed in order to calculate get_token_ids. Please install it with The above exception was the direct cause of the following exception: Traceback (most recent call last): |
configurations DEFAULT_SEARCH_GRAPH_CONFIG = { |
@AnukaMithara have you done the command pip install transformers? |
Tokenizer Import Error When Using Ollama Models
Description
When attempting to use Ollama models (llama3, llama3.1, mistral), the application fails due to a tokenizer import error. The error occurs when trying to calculate tokens for text chunking operations. It working fine with OpenAI Models
Environment
Error Message
Root cause appears to be LangChain attempting to use GPT2TokenizerFast for token counting operations with Ollama models.
Full Traceback
The error originates in
langchain_core/language_models/base.py
and propagates through the text chunking functionality:scrapegraphai/utils/tokenizer.py
scrapegraphai/utils/split_text_into_chunks.py
semchunk/semchunk.py
Steps to Reproduce
Expected Behavior
The application should properly handle token counting operations with Ollama models without requiring the transformers package or should use an alternative tokenizer implementation.
Current Behavior
The application fails with an import error, suggesting installation of the transformers package, which may not be the correct solution for Ollama models.
Possible Solutions
Additional Notes
This appears to be a broader issue with how LangChain handles tokenization for Ollama models, as the error is consistent across multiple Ollama models (llama3, llama3.1, mistral).
The text was updated successfully, but these errors were encountered: