Search

How to download transcripts of YouTube Videos using Python?

With over a billion learning-related videos viewed daily, YouTube can be a great place to learn about things you're passionate about. However, after watching a really cool educational video, there are times when we wish we had made notes to go through later. What if we could extract the entire discussion of the instructor in a text format? Going through transcripts is an awesome way of recollecting the content in the video and we can even highlight all the important sentences for a quick skim through.


This post explains how we can download transcripts of YouTube Videos using Python and save it in a Word Format.

  1. Open Command Prompt

  2. Run 'pip install youtube_transcript_api'. This is a python API which allows you to get the transcript/subtitles for a given YouTube video.

  3. Run 'pip install docx'. The docx module creates, reads and writes Microsoft Office Word 2007 docx files.

Importing the installed modules

from youtube_transcript_api import YouTubeTranscriptApi
from docx import Document
from docx.shared import Pt

Making an API request

video = 'Please enter the URL for your Youtube video'
# For example, video = 'https://www.youtube.com/watch?v=MkNeIUgNPQ8'

# Making an API request to extract the transcript
raw_transcript = YouTubeTranscriptApi.get_transcript(video[32:43], languages=['en'])
transcript = str()
for item in raw_transcript:
    transcript += item['text'] + ' '

At this point, you can print the transcript in console with print(transcript) command. If you'd like to further save this transcript into a word file copy and paste the following code

title = 'Please enter the title for your document'
document = Document()
document.add_heading(title, 0)
paragraph = document.add_paragraph(transcript)
paragraph.style = document.styles['Normal']
font = paragraph.style.font
font.name = 'Arial'
font.size = Pt(11)
paragraph.paragraph_format.line_spacing = 1.5
document.save(title+'.docx')

On running the code, a word file will be created in the same folder containing all the transcript.