Google Docs support #1022

doberst · 2024-10-04T12:57:58Z

LLMWare provides extensive built-in parsing capability for Microsoft Document types (PPTX, DOCX, and XLSX), but does not currently integrate a solution for parsing and integration of Google Docs, Slides and Sheets - along with potential connections into Google Drive repositories for storing and accessing documents.

It would be great to have an integrated capability that supports parsing, text chunking and ingestion of Google document types and repositories. This implementation could take several forms - from a de novo parser/text chunker in Python or C/C++ or more likely an interface into an existing Google document parser - with the supporting code to seamlessly integrate into LLMWare.

EricLiclair · 2024-10-21T16:46:14Z

@doberst seems interesting to me. can u throw some light on what do u suggest for this?

any specific libs that you recommend,
any existing code/class/component/pr in llmware that could be referenced/extended to add the support for GDocs
I'll try and scope-in from my perspective what/where to add changes but it might be time consuming since i'm new to this codebase.

Suggestions for pt. 2 would help speedup the scoping. pt1. will help in better aligning the expected solution.

doberst added the enhancement New feature or request label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Docs support #1022

Google Docs support #1022

doberst commented Oct 4, 2024

EricLiclair commented Oct 21, 2024

Google Docs support #1022

Google Docs support #1022

Comments

doberst commented Oct 4, 2024

EricLiclair commented Oct 21, 2024