This repository contains Python scripts and a local Flask web application for transcribing YouTube videos using various methods. It includes functionalities to retrieve video transcripts using the YouTube Data API, download audio from YouTube videos, and convert audio to text using speech recognition.
- get_youtube_captions.py: Contains functions to retrieve YouTube video caption transcripts using the YouTube Data API.
- youtube_speech_recognition.py: Provides functionality to download audio from YouTube videos for transcription.
- download_youtube_audio.py: Script to download YouTube audio and save it with a timestamp-based filename.
- index.html: HTML template for the Flask web application UI to transcribe YouTube videos.
-
Clone the repository:
git clone https://github.com/heyfoz/python-youtube-transcription.git
-
Install the required libraries:
pip install Flask google-api-python-client pytube pydub SpeechRecognition
- Ensure you have set up your Google API key and environment variable as specified in
get_youtube_captions.py
. - Run the desired Flask application (
get_youtube_captions.py
, oryoutube_speech_recognition.py
). - Open your web browser and navigate to
http://localhost:5000
to access the web application. - Enter a YouTube video URL and choose the desired transcription option.
- Optionally, use
get_youtube_captions.py
to download a .wav file to the project directory.
- YouTube Data API Documentation
- pytube Documentation
- pydub Documentation
- SpeechRecognition Documentation
- The
index.html
file can be used with any of the Flask scripts mentioned above. - There are some limitations on YouTube regarding audio availability based on intellectual property concerns and user configurations. Some videos may not be accessible for transcription.
This project is licensed under the MIT License - see the LICENSE file for details.