Available here: https://erik-overdahl.github.io/huberman-lab-transcripts/
This the data for a static site containing full transcripts of the all of the episodes of the Huberman Lab Podcast. The text is pulled from the captions of the videos posted on YouTube.
Almost all of these are true transcripts, although a few of the podcast videos seem to only have autogenerated captions - these transcripts are labeled as such.
In the future, I would like to have not only these transcripts, but the capability for full-text search of the podcast transcripts AND the comments pulled from YouTube.
Only English transcripts are provided for now. Let me know if you would like to have the Spanish versions as well.
This repo provides a command-line interface named huberman-transcripts
.
Usage: cli.py [OPTIONS] COMMAND [ARGS]...
Options:
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
Commands:
download A wrapper around youtube-dl for downloading video data
generate Generate markdown files for Pelican.
sage: cli.py download [OPTIONS] VIDEO_OR_PLAYLIST_IDS...
A wrapper around youtube-dl for downloading video data
Arguments:
VIDEO_OR_PLAYLIST_IDS... One or more youtube video ids or playlist ids
[required]
Options:
--data-dir TEXT Directory into which to download data [default: ./data]
--help Show this message and exit.
This site uses captions pulled from the Huberman Lab Podcast videos on YouTube.
Data is gathered using huberman-transcripts
. Two files are generated - a json file
containing all the metadata about the YouTube video, and a .vtt captions file.
Usage: cli.py generate [OPTIONS] [VIDEO_IDS]...
Generate markdown files for Pelican
Arguments:
[VIDEO_IDS]... The youtube video ids for which to generate markdown files.
If empty, generate files for all ids in [DATA_DIR]
Options:
--data-dir TEXT Directory of raw video data [default: ./data]
--target-dir TEXT Directory into which to place generated markdown files
for static site generator [default:
./site/content/posts]
--help Show this message and exit.
Files are read into objects, which are then used to create markdown files. Each video has "chapters", which are timestamps helpfully provided by the Huberman Lab Podcast team. Captions are matched to chapters by timestamp, and then re-aligned so that sentences do not break over chapter boundaries.
The site itself is generated using the Pelican static site generator. Everything it needs lives in the site/ directory. The default theme is currently in use, but this is likely to change in the near future.
- download data into folder
- generate markdown
- table of contents
- mark autogenerated
- generate html
- write a README
- host on Github Pages
- get themes working
- get links working
In no particular order.
- add an about page
- automate updating with new podcasts
- add favicon
- add search
- optimize site