Project presentation video: YouTube Video Link
This project aims to make competitive programming studies on Kattis more focused and efficient by categorizing problems based on solution types. Using web scraping and OpenAI’s API, this tool collects problem data, sends descriptions to ChatGPT for classification, and provides a consolidated Excel file that suggests a suitable algorithm type for each problem. This approach benefits anyone looking to streamline their practice in specific problem categories.
For those interested in the problem classification results without diving into the code, the final Excel file with categorized Kattis problems is available here.
This file includes an Algorithm Type
column that reflects the classification provided by ChatGPT, indicating the type of solution approach recommended for each problem.
The Kattis Problem Classifier project consists of several scripts designed to collect, process, and classify Kattis problems. Below is a breakdown of each script and its role in the pipeline:
-
kattis_problem_scraper.py
- Purpose: Scrapes Kattis to gather basic problem data, including titles, difficulty levels, and other metadata, for all available problems.
- Output: Generates an initial Excel file containing the problem data.
-
kattis_problem_description_collector.py
- Purpose: Collects detailed descriptions for each problem by navigating to individual problem pages. This script runs in parallel to speed up the process.
- Output: Adds a description column to the previously generated Excel, creating a new file.
-
kattis_problem_classifier.py
- Purpose: Uses OpenAI's API to classify each problem description by solution type, like dynamic programming or graph traversal. Processes are managed in 250-problem partitions to ensure efficient handling and recovery if needed.
- Output: Generates multiple Excel files (one per partition) with an
algorithm_type
column reflecting the classification by ChatGPT.
-
kattis_problem_classifier_consolidation.py
- Purpose: Consolidates the multiple partitioned Excel files into a single, final Excel file.
- Output: Creates
kattis_problems_combined.xlsx
in thedata_outgoing
folder, containing all Kattis problems with algorithm classifications.
-
Clone the repository:
git clone https://github.com/jmfeck/kattis-problem-classifier.git cd kattis-problem-classifier
-
Set up the environment (with Conda):
conda env create -f environment.yml conda activate kattis-problems-classifier
-
Configure OpenAI API: Set your OpenAI API key in the appropriate script or environment variable:
OPENAI_API_KEY = "your_openai_api_key_here"
Follow this order to execute the scripts and obtain a final classification file:
-
Scrape Kattis Problems:
python scripts/kattis_problem_scraper.py
-
Collect Problem Descriptions:
python scripts/kattis_problem_description_collector.py
-
Classify Problems by Solution Type:
python scripts/kattis_problem_classifier.py
- Note: This script runs classifications in partitions of 250 problems each, resulting in multiple output files.
-
Consolidate Results:
python scripts/kattis_problem_classifier_consolidation.py
- Final consolidated file:
data_outgoing/kattis_problems_combined.xlsx
- Final consolidated file:
This project serves two main audiences:
- Students and Developers: Those who want to study Kattis problems by solution type can directly use the
kattis_problems_combined.xlsx
file as a reference. - Developers and Researchers: Those interested in the methodology can examine the scripts to see how web scraping, parallel processing, and OpenAI API calls can be used to classify problems.
This project is licensed under the MIT License. See the LICENSE file for details.