Backend: AWS DynamoDB with GraphQL API.
Database generation: Python scripts to parse API data, get courses from PDFs, and upload by making requests to GraphQL API.
Frontend: React+Tailwind
Generating the Database was the most challenging and time consuming part. Over 100,000 PDFs have been parsed into roughly 1 million courses across roughly 1 thousand majors. The general process can be broken down into 4 steps.
- Get metadata: Download college IDs, majors, and PDF document ids. Jacobtbigham's documentation on the various Assist API endpoints to extract this information was extremely helpful for this step. Check out his project here: www.github.com/jacobtbigham/ccc_transfers, www.jacobtbigham.com/transfers
- Create Database: Create GraphQL schema, integrate Python with the GraphQL API
- Parse Data: Use metadata to get every PDF document, and extract articulated courses from each PDF using a custom algorithim
- Upload Data: Use GraphQL mutations to upload reformated information to database