Skip to content

Scraping Northeastern's Academic Catalog for use in GraduateNU.

License

Notifications You must be signed in to change notification settings

sandboxnu/major-scraper

Repository files navigation

GraduateNU Major Scraper

This repo houses GraduateNU's major requirements scraper. It scrapes the Northeastern Academic Catalog.

Setup

Clone the repo and run:
pnpm install

Running

After install in dependencies you can run the scraper with:
pnpm scrape.

The scraper scrapes the current catalog by default, but you can specify one or more years for it to scrape as command line arguments. For example to scrape the catalog for 2021, 2022, and the current year, you'd write the following:
pnpm scrape 2021 2022 current

This will populate the results folder with parsed JSON files and the catalogCache folder with cached HTML.

About

Scraping Northeastern's Academic Catalog for use in GraduateNU.

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •