Skip to content

Transcripts of the Huberman Lab Podcast, pulled from Youtube

Notifications You must be signed in to change notification settings

erik-overdahl/huberman-lab-transcripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Full Transcripts of the Huberman Lab Podcast

Available here: https://erik-overdahl.github.io/huberman-lab-transcripts/

This the data for a static site containing full transcripts of the all of the episodes of the Huberman Lab Podcast. The text is pulled from the captions of the videos posted on YouTube.

Almost all of these are true transcripts, although a few of the podcast videos seem to only have autogenerated captions - these transcripts are labeled as such.

In the future, I would like to have not only these transcripts, but the capability for full-text search of the podcast transcripts AND the comments pulled from YouTube.

Only English transcripts are provided for now. Let me know if you would like to have the Spanish versions as well.

Documentation

This repo provides a command-line interface named huberman-transcripts.

Usage: cli.py [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Commands:
  download  A wrapper around youtube-dl for downloading video data
  generate  Generate markdown files for Pelican.

Download

sage: cli.py download [OPTIONS] VIDEO_OR_PLAYLIST_IDS...

  A wrapper around youtube-dl for downloading video data

Arguments:
  VIDEO_OR_PLAYLIST_IDS...  One or more youtube video ids or playlist ids
                            [required]


Options:
  --data-dir TEXT  Directory into which to download data  [default: ./data]
  --help           Show this message and exit.

This site uses captions pulled from the Huberman Lab Podcast videos on YouTube. Data is gathered using huberman-transcripts. Two files are generated - a json file containing all the metadata about the YouTube video, and a .vtt captions file.

Generate

Usage: cli.py generate [OPTIONS] [VIDEO_IDS]...

  Generate markdown files for Pelican

Arguments:
  [VIDEO_IDS]...  The youtube video ids for which to generate markdown files.
                  If empty, generate files for all ids in [DATA_DIR]


Options:
  --data-dir TEXT    Directory of raw video data  [default: ./data]
  --target-dir TEXT  Directory into which to place generated markdown files
                     for static site generator  [default:
                     ./site/content/posts]

  --help             Show this message and exit.

Files are read into objects, which are then used to create markdown files. Each video has "chapters", which are timestamps helpfully provided by the Huberman Lab Podcast team. Captions are matched to chapters by timestamp, and then re-aligned so that sentences do not break over chapter boundaries.

Site

The site itself is generated using the Pelican static site generator. Everything it needs lives in the site/ directory. The default theme is currently in use, but this is likely to change in the near future.

Roadmap

Minimum

  • download data into folder
  • generate markdown
    • table of contents
    • mark autogenerated
  • generate html
  • write a README
  • host on Github Pages
    • get themes working
    • get links working

Improvements

In no particular order.

  • add an about page
  • automate updating with new podcasts
  • add favicon
  • add search
  • optimize site

About

Transcripts of the Huberman Lab Podcast, pulled from Youtube

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages