-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use youtube chapter as hints and metadata in the youtube loader #7366
Comments
@thiswillbeyourgithub Can I give it a shot ? |
I'm hardly in a position to lead anything but sure, absolutely. Thanks a lot. I can happily give an opinion and light review of the code though. Thanks again! |
@thiswillbeyourgithub oh okay !! Actually I have recently started contributing to open source so I really want to contribute to lang chain . So by any chance you know how and where are the issues assigned as I am not aware about the repository much ? |
I think you just have to familiarize yourself with the contributing guidelines and make a PR :) |
@thiswillbeyourgithub so like if we find an issue and nobody is working on it then we can directly submit a PR without assigning. Right? |
I think so yeah. |
Hi, @thiswillbeyourgithub! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale. From what I understand, you requested a feature to use YouTube chapters as hints and metadata in the YouTube loader. This would involve using chapter timecodes and titles to improve the quality of summarized transcripts by adding headers and maintaining context over time. You mentioned that you are unable to contribute to the implementation at the moment but wanted to share the idea. I noticed that AmanSal1 has expressed interest in working on this feature and asked for guidance on how to contribute. You responded by suggesting that they familiarize themselves with the contributing guidelines and make a pull request. AmanSal1 also asked if they can submit a PR for an unassigned issue, and you confirmed that it is possible. If this issue is still relevant to the latest version of the LangChain repository, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days. Thank you for your understanding and contribution to the LangChain project! Best regards, |
Yes this is still relevant |
@baskaryan Could you please help @thiswillbeyourgithub with this issue? They have indicated that it is still relevant. Thank you! |
I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue requests the use of YouTube chapter information in the YouTube loader to improve the quality of summarized transcripts. You had mentioned that you are unable to contribute at the moment but wanted to share the idea. A user named AmanSal1 has expressed interest in working on this feature and asked for guidance on how to contribute. Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
I still do think it's a valuable feature to incorporate chapters as metadata. Or even if someone manages : to include chapter transition into the text directly using timestamps. |
@thiswillbeyourgithub I'd love to give this a shot, I modified this to extract the description, I think a bit of regex should allow me to extract the timestamps (from description) if available |
It looks like someone already put a PR for this feature into the youtube-transcript-api (which I believe this users). Not sure how active the maintainer for that is. jdepoix/youtube-transcript-api#254 |
If anyone is still interested by youtube chapters aware subtitles, I made it as part of wdoc my RAG app. Here's the link to the relevant function: https://github.com/thiswillbeyourgithub/wdoc/blob/af5297171ac744677152cf01296e6b24171b7035/wdoc/utils/loaders.py#L2221 |
Feature request
When using the youtube loader. I think it would be useful to take into account the chapters if present.
Motivation
There are useful information present in the youtube chapter title and timecodes that could be of use to LLMs.
Summarizing transcripts would probably be of higher quality if headers are present rather than a huge wall of text.
Adding metadata is always a win.
Your contribution
Unfortunately not able to help for the time being but wanted to get the idea out there.
The text was updated successfully, but these errors were encountered: