Coursera Lecture Grabber

Coursera is a great idea. Free online classes from top universities. A talented teenager not being challenged in school can take a math class from Stanford. A stay-at-home parent can get started on a new profession. A shmoe like myself, working in a different field than what he went to school for, can try to close the gap with his CS-major coworkers. Or people who just want to learn something new can survey a variety of topics taught by experts in their fields.

I don’t think online courses can easily replace in-person learning (not that I ever did much of that myself anyway), but that’s not the point here. Rather, Coursera (and similar sites like MIT’s OpenCourseWare) can help to democratize access to higher learning. Anyone with access to a computer has access to unbelievable educational resources.

Of course, in practice self-learning can be very difficult. I’ve never had much self-discipline, and for someone like me I need teachers and deadlines breathing down my neck in order to get my classwork done. But I’m willing to give this online course thing a shot, and it helps that Coursera has a very slick website. I enrolled in the Princeton course “Algorithms, Part I”, which started this past Sunday. So far it seems much more polished and complete than the course offerings I had looked at on MIT’s OpenCourseWare. This course is obviously geared towards the online audience, not just a recording of an in-person course.

In particular I appreciate that the course lectures are high quality MP4 files that are easily downloaded, and not embedded YouTube videos or such. However, each week’s lectures are split into multiple parts, and being the lazy person that I am I did not want to have to remember to log onto the website every Sunday and manually download each video, then copy them over to my phone, tablet, laptop, etc. So I rigged the following bash script, which downloads videos from a Coursera lecture site.

It uses 2 files:
– cookies.txt: a Netscape-style cookies file containing your Coursera login session info (chrome firefox)
– downloaded.txt: keeps track of which lectures have already been downloaded. Will be created if it doesn’t exist.

Make sure to set the variables under “# SETTINGS”, in particular for the course name, the download directory, and the directory containing the cookies.txt and downloaded.txt. I have this set to run weekly from cron and download to my dropbox, so it will automatically sync out to the rest of my devices.

This script could be easily modified to also download lecture slides, by grep-ing for pdf and downloading those as well.

The only caveat with this script is that it depends on the cookies being valid and not expired, so you’ll need to update the cookies file every so often. I started toying with logging in via Perl’s WWW::Mechanize, but didn’t get too far, so I stuck with the simpler solution.

#!/bin/bash
# lecturegrab.sh
# Downloads course lectures from Coursera

# SETTINGS
MYCOURSE='algs4partI-2012-001'
DLDIR="$HOME/Dropbox/Videos"
MYDIR="$HOME/Dropbox/Code/lecturegrab"
COOKIEFILE="$MYDIR/cookies.txt"
HISTORYFILE="$MYDIR/downloaded.txt"

# Check that we can read the cookie file
[ ! -r "${COOKIEFILE}" ] && echo "Error, cannot read ${COOKIEFILE}" && exit 1

# Get the list of all available lectures
VIDS_AVAIL=`wget --load-cookies ${COOKIEFILE} -O - https://class.coursera.org/${MYCOURSE}/lecture/index 2>&1 | grep download.mp4 | sed 's/.*lecture_id=\([0-9]\+\).*/\1/' | tr '\n' ' '`

# Go to our downloads directory, then pull every lecture we don't already have
cd "${DLDIR}"
for i in $VIDS_AVAIL; do
	if ! grep "^$i\$" ${HISTORYFILE}; then
		wget --content-disposition --load-cookies ${COOKIEFILE} "https://class.coursera.org/${MYCOURSE}/lecture/download.mp4?lecture_id=$i"
		echo "$i" >> "${HISTORYFILE}"
	fi
done
Advertisements

~ by kilbasar on August 14, 2012.

2 Responses to “Coursera Lecture Grabber”

  1. Hi,

    this is a really cool script!!!

    I like to download the PDFs of the lectures, too. So I decided to append to the script a little more:
    {code}
    HISTORYFILE_PDF=”$MYDIR/downloaded-pdf.txt”
    # Get the list of all available lecture pdfs
    PDFS_AVAIL=`wget –load-cookies ${COOKIEFILE} -O – https://class.coursera.org/${MYCOURSE}/lecture/index 2>&1 | grep -oE “https.*\.pdf”`

    # Go to our downloads directory, then pull every lecture PDF we don’t already have
    cd “${DLDIR}”
    for i in $PDFS_AVAIL; do
    if ! grep “^$i\$” ${HISTORYFILE_PDF}; then
    wget –content-disposition –load-cookies ${COOKIEFILE} “$i”
    echo “$i” >> “${HISTORYFILE_PDF}”
    fi
    done
    {code}

    Thank’s a lot for sharing this gerat script!!!

  2. cool script, i actually found this while looking for a solution for loading cookies with wget.

    id approached the problem by running some jquery in chrome dev console to generate wget string for all videos it grabbed the download urs and did some string manipulation to get the title that would be used for the file names.

    i dumped the list into a script then realized that id have to log in… when i found your script which is way more elegant.

    thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: