Skip to content

A simple Python module to download Keats content

License

Notifications You must be signed in to change notification settings

LittleHellcat13/keats_crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keats Crawler

This is a python module for downloading videos and other resources from KCL's Keats e-learning platform.

By default, it remembers the content you've already downloaded, so will skip it subsequently when you need to get the more recent files. Works on most module pages but there are some edge cases where it might run into issues.

NEW: Now with the ability to download videos from Microsoft Stream.

Requirements

  • Python 3
  • Works on Linux, Windows WSL, macOS

Installation

  1. Clone this repository: git clone https://github.com/mannmann2/keats_crawler.git
  2. cd keats_crawler
  3. pip install -r requirements.txt
  4. Download chromedriver for your version of Chrome
  5. sudo apt install ffmpeg

Usage

  1. Self enrol in the Keats module, if not already enrolled
  2. Update config.py (See below. Cookies must be updated each time your session expires)
  3. Run: python crawl.py

Usage for downloading videos from Microsoft Streams

  1. Get video links and access token from https://web.microsoftstream.com
  2. Run: python msstream.py

Config settings

MODULE: Name of module and the folder in which to download files
URLS: Mapping between Module names and their Keats urls
PATH: Location in which to create the module folder
PATH_TO_CHROMEDRIVER: Location of chromedriver executable
COOKIES: Copy and add cookies from your browser after logging into Keats. These can be found by navigating to the Network tab of the browser inspector.
DOWNLOAD_RESOURCES: True/False - Download the non-video resources (ppt, pdf, py, etc)
DOWNLOAD_VIDEOS: True/False - Download videos embedded in Keats (Won't work for videos linked on some other website)

VIDEO_PROMPT: True/False - Prompt before extracting each video for download (Disabling this will automatically download all extracted videos)
VIDEO_LIMIT: Integer or None - Limit the number of videos extracted
SKIP_DUPLICATES: True/False - To skip files already downloaded (Only works if the previous downloads occurred using this package)
REMEMBER_DOWNLOADS: True/False - Add files being downloaded in current crawl to a duplicate filter (Used to check duplicates)

MS_STREAMS_LINKS: Links to videos on Microsoft Stream
MS_STREAMS_ACCESS_TOKEN: Authorization token to Microsoft Stream internal API

  • Free software: MIT license

About

A simple Python module to download Keats content

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%