Course Recommendation System

This project was developed as part of a hackathon challenge to build a Course Recommendation System using various datasets from Coursera, Stanford, and Harvard. The final recommendation system is content-based and built using TF-IDF vectorization and cosine similarity. Below is a description of each file in the project.

Project Structure and File Descriptions

Data Files (Stored in the `data` folder)

Coursera.csv:
The main dataset used for building the recommendation system, containing course descriptions and metadata.
coursera_processed_data.csv:
A processed version of the Coursera dataset with clean and structured data for easier analysis.
coursera_processed_data_with_cluster.csv:
The Coursera dataset after applying K-means clustering, with each course assigned a category label (e.g., Art, Cybersecurity).
mooc_action_features.tsv, mooc_actions.tsv, and mooc_action_labels.tsv:
Stanford dataset files used for exploration but not incorporated into the final system.
HXPC13_DI_v3_11-13-2019.csv:
A dataset from the Harvard collection explored early in the project but excluded due to insufficient course data.
Person Course Documentation.pdf & Person Course Deidentification.pdf:
Documents related to the Harvard dataset describing the data and its anonymization process.

Exploration Files

explore_harvard_data.ipynb:
A Jupyter notebook used to explore the Harvard dataset. This analysis revealed a low user-course interaction rate (~1%) and only five courses.
explore_stanford_data.ipynb:
A Jupyter notebook used to explore the Stanford dataset. The dataset only contained course IDs without names, making it difficult to use for recommendations.

Clustering and Recommendation Files

clustering.ipynb:
This notebook contains the code for applying K-means clustering to the Coursera dataset to generate course categories. Categories like "Art" and "Machine Learning" were created to help manage and filter the dataset.
recommendation_system.ipynb:
The main notebook for building the recommendation system. It uses TF-IDF Vectorization to convert course descriptions into feature vectors and cosine similarity to recommend courses based on content.

Dashboard and Similarity Matrix

similarity_matrix.pkl:
A serialized similarity matrix used to speed up the recommendation process by storing precomputed course similarities.

Conclusion

This project demonstrates the steps taken to create a content-based recommendation system by exploring, clustering, and analyzing multiple datasets. The system is designed for easy dataset management and includes a dashboard for viewing course categories.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
data		data
web		web
README.md		README.md
clustering.ipynb		clustering.ipynb
eda_coursera_dataset.ipynb		eda_coursera_dataset.ipynb
explore_harvard_data.ipynb		explore_harvard_data.ipynb
explore_stanford_data.ipynb		explore_stanford_data.ipynb
helper_function.py		helper_function.py
recommendation_system.ipynb		recommendation_system.ipynb
similarity_matrix.pkl		similarity_matrix.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Course Recommendation System

Project Structure and File Descriptions

Data Files (Stored in the `data` folder)

Exploration Files

Clustering and Recommendation Files

Dashboard and Similarity Matrix

Conclusion

About

Uh oh!

Releases

Packages

Languages

TeamP3-2/explore_datasets

Folders and files

Latest commit

History

Repository files navigation

Course Recommendation System

Project Structure and File Descriptions

Data Files (Stored in the data folder)

Exploration Files

Clustering and Recommendation Files

Dashboard and Similarity Matrix

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data Files (Stored in the `data` folder)

Packages