AISA study project
SportBet is an AISA study project aimed at predicting NBA game outcomes through data analysis and machine learning. This project involves gathering NBA data, preprocessing it, and applying machine learning techniques to design predictive models. The primary goal is to develop a system that provides insights into betting strategies for NBA games.
- Sports: Revolutionize coaching, fan engagement, or athlete training with ML-powered insights.
- Gaming: Enhance player experiences by personalizing game content or improving multiplayer matchmaking.
- Music: Democratize music creation with AI, from personalized recommendations to automated composition.
- Learning Opportunity: The hackathon provides a structured opportunity to explore end-to-end ML pipelines.
- Hands-on experience with:
- Data Collection: How to gather meaningful datasets.
- Preprocessing: Cleaning and transforming data.
- Model Training: Building ML models.
- Evaluation & Deployment: Validating and showcasing solutions in practical, real-world settings.
- Available sources: Mention where the data came from (e.g., APIs, web scraping, sensors, public datasets).
- Collection: Collect your own data.
- Challenges: Briefly discuss any difficulties (e.g., missing data, limited availability).
- Volume: Indicate the size and type of data (e.g., 10GB of CSVs, 100K images).
- Tools: Mention any tools used for data collection (e.g., Python scripts, Google Sheets, Mobile cameras).
- Steps Taken: Outline how you prepared the data (e.g., cleaning, normalization, feature engineering).
- Techniques/Tools: Mention techniques (e.g., handling missing values, one-hot encoding) and tools (e.g., Pandas, NumPy).
- Challenges Solved: Highlight any issues you overcame, such as imbalanced datasets or noisy data.
- Model Choice: State the model(s) used (e.g., Random Forest, CNN, Transformers) and why they were selected.
- Frameworks: List frameworks/libraries (e.g., TensorFlow, PyTorch, Scikit-learn).
- Training Specs: Mention key details (e.g., epochs, hyperparameters, compute resources).
- Innovations: Point out unique optimizations, customizations, or novel approaches.
- Solve 3 different ML tasks for your theme, ex:
- Regression: to predict player performance metrics like speed, stamina, or scoring potential.
- Classification: to identify the genre of a song or moderating in-game chat.
- Clustering: to identify team formations
- Object detection and action recognition: to identify key actions like goals, passes, or fouls in match footage.
- Gaze estimation: to analyze where players are looking during critical moments
- Sentiment analysis: on social media or fan forums
- Metrics: Show metrics relevant to your problem (e.g., accuracy, precision, recall, RMSE).
- Comparison: If applicable, compare models or baselines.
- Visuals: Include plots or charts (e.g., confusion matrix, ROC curve).
- Insights: Highlight key takeaways about the model’s performance and limitations.
-
Deployment:
- Process: Briefly explain how the model is deployed (e.g., Flask API, cloud platform).
- Usage: Highlight how users interact with it (e.g., web app, mobile app).
- Scalability: Mention steps taken to ensure the system can handle real-world usage
-
Demo:
- Brings the ML pipeline to life!
- Create a live or interactive demo to demonstrate the practical viability of the pipeline beyond just theory or code.
- Day 4
- Showcase your project with slides
- 20 minutes
- AISA poster session
- Objective: Gather up-to-date NBA game data from reliable sources.
- Tools Used: Web scraping techniques are employed using libraries such as Requests and Pandas to extract data from basketball-reference.com.
- Outcome: A comprehensive dataset that includes game statistics, team records, and player performances.
- Objective: Prepare the raw data for analysis.
- Tasks:
- Handling missing values.
- Converting categorical data to numerical formats.
- Normalizing and scaling features for consistency.
- Outcome: A clean, structured dataset ready for model training.
- Objective: Develop and compare machine learning models for predicting game outcomes.
- Methods:
- Logistic Regression: Used for its simplicity and performance in binary classification tasks.
- Random Forest: Deployed for handling large datasets and achieving higher accuracy.
- Support Vector Machine (SVM): Implemented for robust decision boundary finding.
- Outcome: Evaluation of model performance based on accuracy, precision, recall, and F1-score.
Follow these steps to set up the project on your local machine:
-
Clone the repository:
git clone https://github.com/MarkusRenner/SportBet.git
-
Install the required packages:
pip install -r requirements.txt
or:
conda env create -f environment.yml
-
Open Jupyter Notebook or JupyterLab from the terminal:
jupyter notebook
or
jupyter lab
-
Navigate to data scraping script and run:
web_scraping.ipynb
-
Execute preprocessing and training scripts:
preprocessing.ipynb
regression_training.ipynb
- Find the scraped dataset of NBA stats in the
data
directory.
- Detailed model comparison reports are available in the
results
directory. - The final report discusses model performance and potential improvements.
- Expand Data Sources: Integrate more diverse data sources for a broader dataset.
- Advanced Modeling: Explore deep learning techniques for enhanced prediction accuracy.
- Real-Time Predictions: Implement a system for live predictions with ongoing data updates.
This project is licensed under the MIT License.
-
Requirements.txt
conda list -e > requirements.txt
-
Environment.yaml
conda env export > environment.yaml