How to download transcripts of YouTube Videos using Python?
With over a billion learning-related videos viewed daily, YouTube can be a great place to learn about things you're passionate about. However, after watching a really cool educational video, there are times when we wish we had made notes to go through later. What if we could extract the entire discussion of the instructor in a text format? Going through transcripts is an awesome way of recollecting the content in the video and we can even highlight all the important sentences for a quick skim through.
This post explains how we can download transcripts of YouTube Videos using Python and save it in a Word Format.
Open Command Prompt
Run 'pip install youtube_transcript_api'. This is a python API which allows you to get the transcript/subtitles for a given YouTube video.
Run 'pip install docx'. The docx module creates, reads and writes Microsoft Office Word 2007 docx files.
Importing the installed modules
from youtube_transcript_api import YouTubeTranscriptApi from docx import Document from docx.shared import Pt
Making an API request
video = 'Please enter the URL for your Youtube video' # For example, video = 'https://www.youtube.com/watch?v=MkNeIUgNPQ8' # Making an API request to extract the transcript raw_transcript = YouTubeTranscriptApi.get_transcript(video[32:43], languages=['en']) transcript = str() for item in raw_transcript: transcript += item['text'] + ' '
At this point, you can print the transcript in console with print(transcript) command. If you'd like to further save this transcript into a word file copy and paste the following code
title = 'Please enter the title for your document' document = Document() document.add_heading(title, 0) paragraph = document.add_paragraph(transcript) paragraph.style = document.styles['Normal'] font = paragraph.style.font font.name = 'Arial' font.size = Pt(11) paragraph.paragraph_format.line_spacing = 1.5 document.save(title+'.docx')
On running the code, a word file will be created in the same folder containing all the transcript.