Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for various document upload formats #571

Open
Padmaapparao opened this issue Nov 15, 2024 · 2 comments
Open

Support for various document upload formats #571

Padmaapparao opened this issue Nov 15, 2024 · 2 comments

Comments

@Padmaapparao
Copy link

need to be able to upload "forms" in text or word documents, scanned images, pdf docs, json files, jpeg/png images, mp4 and other video clips, and audio clips for "Q/A, summarization " etc with OPEA RAG.
Pipeline should be able to consume any type of upload and extract the content (chunk...)

@yongfengdu
Copy link
Collaborator

There are multiple dataprep components here(Not supported by helm chart deploy yet), does any of this satisfy your requirement?
https://github.com/opea-project/GenAIComps/tree/main/comps/dataprep

@eero-t
Copy link
Contributor

eero-t commented Nov 22, 2024

Discussed this recently with Padma.

@yongfengdu DocSum application supports currently PDF, docx, audio and mp4 video formats.

However, while data-prep service may support also images, DocSum app does not currently use data-prep service.

PS. this ticket would be more relevant for the Comps (or Examples) repo where such support is implemented, than for this (k8s integration) one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants