A semantic search engine for the penn course database!
React project with all frontend related code and docker config
npm install
npm start
app.py
: backend routes
courses-scraper.py
: scrapes penn catalog for courses and saves it in courses.csv
embed.py
: embeds course info into vector via OPENAI (model: text-embedding-3-small
) and saves it in courses_embed.csv
review-scraper.py
: get course review information for a course & professor and saves it in courses_embed_profs.csv
mongo_load.py
: uploads embeddings from courses_embed_profs.csv into MongoDB
query_engine
: logic to query for results
db: mongoDB
- Migrate from pinecone to alt vectordb (atlas?)
- Remove unused files
- Refactor query logic to be readble and modular
- detailed readme on running, stack etc.
- backend into its own directory (maybe move to fastapi?)
- move hosting to porter.run
- chron job to auto update db