Welcome to the "Stack Overflow Tag Prediction" repository! This project is dedicated to predicting tags (or labels) for questions posted on Stack Overflow. The primary objective is to develop a machine learning model capable of automatically assigning relevant tags to user-submitted questions. This prediction system aims to simplify the process of finding answers and experts in specific topics on the platform.
Effective tagging is essential on Stack Overflow to ensure that questions reach the right audience and receive accurate responses. However, manual tagging can be cumbersome and time-consuming for users. Our project tackles this challenge by automating the tagging process, making it easier for both question askers and answerers to navigate the vast knowledge repository of Stack Overflow.
With accurate tag predictions, users can find answers to their questions more quickly, and experts can discover questions aligned with their expertise. This leads to a more efficient and collaborative community, ultimately enhancing the Stack Overflow experience for everyone.
- React js
- Node js
- Express js
- MongoDb
- Redux
- Json web token
- Machine Learning
- NLP
To begin working with this project, follow these steps:
-
Clone this repository to your local machine:
git clone https://github.com/tk-pranav/stackoverflow_tag_prediction.git
-
Install required dependencies using the command:
npm install
-
Install the required Python packages:
pip install -r requirements.txt
-
Start the App using the command:
npm start
-
Explore the Jupyter notebooks for data analysis and model development.
-
Train and evaluate the tag prediction model using the dataset and code.
We used a Kaggle dataset for our project.
Link: https://www.kaggle.com/datasets/stackoverflow/stacksample
Our approach involves employing machine learning techniques, including natural language processing (NLP) and classification algorithms, to predict tags for Stack Overflow questions. The model is trained on historical question data and can predict relevant tags based on the content of the questions.
For detailed insights into the model's performance and evaluation metrics, please refer to the Jupyter notebook. We continuously strive to enhance the model's accuracy and efficiency.
We enthusiastically welcome contributions from the open-source community. If you wish to contribute to this project, please adhere to these guidelines:
Fork the repository.
- Create a new branch for your changes.
- Implement your contributions.
- Submit a pull request with a clear description of your changes.
Happy Coding!