Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Python Code Text Splitting #23

Open
skyl opened this issue Nov 10, 2024 · 0 comments
Open

Enhance Python Code Text Splitting #23

skyl opened this issue Nov 10, 2024 · 0 comments

Comments

@skyl
Copy link
Owner

skyl commented Nov 10, 2024

Objective

Enhance the current Python code text splitting mechanism by experimenting with more sophisticated methods such as AST (Abstract Syntax Tree) or configuring existing tools for better performance.

Background

In the py/packages/corpora_ai/split.py, the PythonCodeTextSplitter from the langchain_text_splitters library is being used for splitting Python code. However, this method may not be optimal as it tends to split code indiscriminately.

Task

  1. Research Alternatives:

    • Explore options for utilizing AST-based splitting to handle Python syntax more effectively.
    • Investigate other third-party libraries that offer advanced code splitting capabilities.
  2. Configuration:

    • Review the current configuration of PythonCodeTextSplitter and identify potential enhancements or settings that optimize its performance with Python code.
  3. Implementation:

    • Experiment with different text splitting mechanisms for Python code using AST or reconfigured existing methods.
    • Ensure the new method integrates seamlessly with the existing codebase.
  4. Testing and Comparison:

    • Develop test cases to validate the new splitting method against diverse Python code snippets.
    • Compare the results with the current method to evaluate improvements in clarity and logic separation.

Acceptance Criteria

  • A summary document comparing different text splitting methods and their performance.
  • Code implementation demonstrating potential improvements and comparison with existing methods.
  • Insight into whether the new method offers better logical separation in Python code splitting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant