mlb-data-lab
is a Python-based application and library that generates comprehensive advanced stat summary sheets for MLB players, customizable by year, providing in-depth analysis and visualizations. It can also be used as a library module, enabling users to develop their own features and extend functionality for custom applications and data processing needs. By leveraging the pybaseball
module, MLB-StatsAPI
module, and other Python libraries, the project facilitates the collection, analysis, and formatting of data for use in reports, dashboards, and other analytical tools.
The project sources data from MLB and Fangraphs, ensuring accurate and up-to-date statistics. Future updates will expand the application's features and functionality, allowing it to serve both as a standalone tool and as a library for integration into other projects.
Below are samples of the summary sheets that can be generated by this project. The first sample is a Batting Summary for Riley Greene for the 2024 season. The second sample is a Pitching Summary for Tarik Skubal for the 2024 season.
In addition to the baseball stats you would expect, the summary sheets also include the following "advanced" stats:
Batters | Pitchers | ||||
---|---|---|---|---|---|
BB% | UBR | K/9 | Opponent Avg | Swing % | |
K% | wRC | BB/9 | WHIP | Splits | |
OBP | wRAA | K/BB | BABIP | ||
SLG | wOBA | H/9 | LOB% | ||
OPS | wRC+ | HR/9 | ERA- | ||
ISO | WAR | K% | FIP- | ||
Spd | Splits | BB% | FIP | ||
BABIP | K-BB% | RS/9 |
The mlb-data-lab
project is organized as follows:
mlb_stats/
├── README.md # Project documentation
├── setup.py # Setup file for packaging and installation
├── requirements.txt # Dependencies for the project
├── mlb_stats/
│ ├── apis/
│ │ ├── stats_api.py # API client for fetching MLB stats
│ │ ├── fangraphs_client.py # API client for Fangraphs data
│ ├── components/
│ │ ├── stats_table.py # Class for generating stats tables
│ ├── data/
│ │ ├──
│ ├── data_viz/
│ │ ├── batting_spray_chart.py
│ │ ├── pitch_break_plot.py
│ │ ├── pitch_breakdown_table.py
│ │ ├── pitch_velocity_distribution_plot.py
│ │ ├── rolling_pitch_usage_plot.py
│ │ ├── plotting.py
│ ├── player/
│ │ ├── player.py
│ │ ├── player_bio.py
│ │ ├── player_info.py
│ │ ├── player_lookup.py
│ ├── stats/
│ │ ├──
│ ├── summary_sheets/
│ │ ├── batter_summary_sheet.py
│ │ ├── pitcher_summary_sheet.py
│ │ ├── summary_sheet.py
│ │ ├── team_summary_sheet.py
│ ├── team/
│ │ ├── roster.py
│ │ ├── team.py
│ ├── config.py
│ ├── constants.py
│ ├── utils.py
├── scripts/
│ ├── generate_player_summary.py
│ ├── save_statcast_data.py
├── tests/
│ ├──
- mlb_stats/: Core application logic and components.
- apis/: API clients for retrieving stats from external services like MLB and Fangraphs.
- components/:
- data/:
- data_viz/:
- player/:
- stats/:
- summary_sheets/:
- team/:
- scripts/: Scripts for generating player summary sheets and saving statcast data.
- tests/: Unit tests for verifying the functionality of various components and modules.
To get started with the project, follow these steps:
- Clone the repository:
git clone https://github.com/timothyf/mlb_stats.git
cd mlb_stats
- Set up a Python virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate
- Install the required dependencies:
pip install -r requirements.txt
There are several scripts in the scripts
directory for some basic functionality:
python scripts/generate_player_summary.py [options]
Options:
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]
Run the project by executing the main script in the scripts
directory:
python scripts/save_statcast_data.py [options]
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]
python scripts/generate_player_summary.py --players 'Riley Greene'
Output:
mlb_stats/output/2024/Tigers/
batter_summary_riley_greene.png
python scripts/generate_player_summary.py --teams 'Detroit Tigers' --year 2024
This project was inspired by my time working in the R&D department of the Washington Nationals, and the pitching summary project from Thomas Nestico. Here is a link to an article describing his project:
https://medium.com/@thomasjamesnestico/creating-the-perfect-pitching-summary-7b8a981ef0c5
This package and its author are not affiliated with MLB or any MLB team. This API wrapper interfaces with MLB's Stats API. Use of MLB data is subject to the notice posted at http://gdx.mlb.com/components/copyright.txt.
<style> table td.batter-col { background-color: lightblue; color: black; } table td.pitcher-col { background-color: lightgreen; color: black; } </style>