SportBet: Predicting NBA Outcomes

AISA study project

Overview

SportBet is an AISA study project aimed at predicting NBA game outcomes through data analysis and machine learning. This project involves gathering NBA data, preprocessing it, and applying machine learning techniques to design predictive models. The primary goal is to develop a system that provides insights into betting strategies for NBA games.

TASK

Emphasize the potential real-world impact of their work:

Sports: Revolutionize coaching, fan engagement, or athlete training with ML-powered insights.
Gaming: Enhance player experiences by personalizing game content or improving multiplayer matchmaking.
Music: Democratize music creation with AI, from personalized recommendations to automated composition.

The ML pipeline

Learning Opportunity: The hackathon provides a structured opportunity to explore end-to-end ML pipelines.
Hands-on experience with:
- Data Collection: How to gather meaningful datasets.
- Preprocessing: Cleaning and transforming data.
- Model Training: Building ML models.
- Evaluation & Deployment: Validating and showcasing solutions in practical, real-world settings.

Requirements

1- Data Collection

Available sources: Mention where the data came from (e.g., APIs, web scraping, sensors, public datasets).
Collection: Collect your own data.
Challenges: Briefly discuss any difficulties (e.g., missing data, limited availability).
Volume: Indicate the size and type of data (e.g., 10GB of CSVs, 100K images).
Tools: Mention any tools used for data collection (e.g., Python scripts, Google Sheets, Mobile cameras).

2- Data Preprocessing

Steps Taken: Outline how you prepared the data (e.g., cleaning, normalization, feature engineering).
Techniques/Tools: Mention techniques (e.g., handling missing values, one-hot encoding) and tools (e.g., Pandas, NumPy).
Challenges Solved: Highlight any issues you overcame, such as imbalanced datasets or noisy data.

3- Model Training

Model Choice: State the model(s) used (e.g., Random Forest, CNN, Transformers) and why they were selected.
Frameworks: List frameworks/libraries (e.g., TensorFlow, PyTorch, Scikit-learn).
Training Specs: Mention key details (e.g., epochs, hyperparameters, compute resources).
Innovations: Point out unique optimizations, customizations, or novel approaches.

4- ML Tasks

Solve 3 different ML tasks for your theme, ex:
- Regression: to predict player performance metrics like speed, stamina, or scoring potential.
- Classification: to identify the genre of a song or moderating in-game chat.
- Clustering: to identify team formations
- Object detection and action recognition: to identify key actions like goals, passes, or fouls in match footage.
- Gaze estimation: to analyze where players are looking during critical moments
- Sentiment analysis: on social media or fan forums

5- Model Evaluation

Metrics: Show metrics relevant to your problem (e.g., accuracy, precision, recall, RMSE).
Comparison: If applicable, compare models or baselines.
Visuals: Include plots or charts (e.g., confusion matrix, ROC curve).
Insights: Highlight key takeaways about the model’s performance and limitations.

Bonus

Deployment:
- Process: Briefly explain how the model is deployed (e.g., Flask API, cloud platform).
- Usage: Highlight how users interact with it (e.g., web app, mobile app).
- Scalability: Mention steps taken to ensure the system can handle real-world usage
Demo:
- Brings the ML pipeline to life!
- Create a live or interactive demo to demonstrate the practical viability of the pipeline beyond just theory or code.

Presentation

Day 4
Showcase your project with slides
20 minutes
AISA poster session

Project Workflow

1. Data Collection via Web Scraping

Objective: Gather up-to-date NBA game data from reliable sources.
Tools Used: Web scraping techniques are employed using libraries such as Requests and Pandas to extract data from basketball-reference.com.
Outcome: A comprehensive dataset that includes game statistics, team records, and player performances.

2. Data Preprocessing

Objective: Prepare the raw data for analysis.
Tasks:
- Handling missing values.
- Converting categorical data to numerical formats.
- Normalizing and scaling features for consistency.
Outcome: A clean, structured dataset ready for model training.

3. Machine Learning Models

Objective: Develop and compare machine learning models for predicting game outcomes.
Methods:
- Logistic Regression: Used for its simplicity and performance in binary classification tasks.
- Random Forest: Deployed for handling large datasets and achieving higher accuracy.
- Support Vector Machine (SVM): Implemented for robust decision boundary finding.
Outcome: Evaluation of model performance based on accuracy, precision, recall, and F1-score.

Installation and Setup

Follow these steps to set up the project on your local machine:

Clone the repository:

git clone https://github.com/MarkusRenner/SportBet.git

Install the required packages:

pip install -r requirements.txt

or:

conda env create -f environment.yml

Open Jupyter Notebook or JupyterLab from the terminal:
```
jupyter notebook
```
or
```
jupyter lab
```
Navigate to data scraping script and run:
```
web_scraping.ipynb
```
Execute preprocessing and training scripts:
preprocessing.ipynb
regression_training.ipynb

Data

Find the scraped dataset of NBA stats in the data directory.

Results and Insights

Detailed model comparison reports are available in the results directory.
The final report discusses model performance and potential improvements.

Future Work

Expand Data Sources: Integrate more diverse data sources for a broader dataset.
Advanced Modeling: Explore deep learning techniques for enhanced prediction accuracy.
Real-Time Predictions: Implement a system for live predictions with ongoing data updates.

License

This project is licensed under the MIT License.

How to safe conda environment

Requirements.txt
```
    conda list -e > requirements.txt
```
Environment.yaml
```
    conda env export > environment.yaml
```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
code		code
data		data
results		results
README.md		README.md
environment.yaml		environment.yaml
nba_api.ipynb		nba_api.ipynb
nba_historical_games.csv		nba_historical_games.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SportBet: Predicting NBA Outcomes

Overview

TASK

Emphasize the potential real-world impact of their work:

The ML pipeline

Requirements

1- Data Collection

2- Data Preprocessing

3- Model Training

4- ML Tasks

5- Model Evaluation

Bonus

Presentation

Project Workflow

1. Data Collection via Web Scraping

2. Data Preprocessing

3. Machine Learning Models

Installation and Setup

Data

Results and Insights

Future Work

License

How to safe conda environment

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MarkusRenner/SportBet

Folders and files

Latest commit

History

Repository files navigation

SportBet: Predicting NBA Outcomes

Overview

TASK

Emphasize the potential real-world impact of their work:

The ML pipeline

Requirements

1- Data Collection

2- Data Preprocessing

3- Model Training

4- ML Tasks

5- Model Evaluation

Bonus

Presentation

Project Workflow

1. Data Collection via Web Scraping

2. Data Preprocessing

3. Machine Learning Models

Installation and Setup

Data

Results and Insights

Future Work

License

How to safe conda environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages