- Project Overview
- Dataset
- Preprocessing
- Libraries-Used
- Data Pipeline Preparation
- Model Processing
- Demo
The Chess Openings Analyzer project leverages machine learning (ML) techniques, including logistic regression, to analyze the success rates of various chess openings. By incorporating the latest advances in ML and data mining, such as deep learning algorithms and neural networks, this project offers several advantages over current state-of-the-art solutions.
Using machine learning algorithms like logistic regression and leveraging Gemini AI
for a better understanding of the move suggested from the model we trained, the project predicts game results based on openings, ELO ratings, and other features in the dataset. Additionally, clustering techniques are employed to analyze the results of different openings, providing players with valuable insights into strategic decision-making.
The dataset used in this project is sourced from Kaggle and offers a comprehensive collection of 6.25 million chess games played on lichess.org during July 2016. Each game is meticulously represented by a row, providing valuable insights into player strategies, game dynamics, and opening variations.
A basic overview of the preprocessing steps applied to the Chess Games dataset is outlined below:
- Find and Drop Missing Values: Identify and drop rows with missing values.
- Drop NaNs: Remove any remaining NaN values.
- Drop Unnecessary Columns: Remove the "White" and "Black" columns, as they only contain IDs that are not useful for analysis.
- Encode Game Results: Encode game results into three categories: 1 for White win, 0 for Black win, and 2 for tie.
- Drop UTC Date and Time: Remove UTC date and time columns as they are not required for analysis.
- Categorize Events: Categorize and encode event types into fewer categories (e.g., Blitz, Blitz tournament).
- Scale Elo Ratings: Scale White and Black Elo ratings for analysis.
- Drop Rating Differences: Drop columns for White and Black rating differences as they provide limited value.
- Create Opening DataFrame: Create a separate DataFrame for opening moves using the ECO code for mapping.
- Drop Time Control: Temporarily drop the time control column for analysis, which may be useful for later.
- Extract Individual Moves: Parse and extract individual moves from the AN column to analyze opening moves in detail.
- Encode Termination Conditions: Encode termination conditions as "normal" (1) and "time forfeit" (0), while reducing instances based on conditions such as "Abandoned" and "Rule Infarction".
These preprocessing steps help prepare the dataset for further analysis and machine learning modeling.
Utilized machine learning algorithms to analyze results and select an efficient model for various data types. My approach includes the following steps:
-
I used a
DataFrameSelector
class that inherits fromBaseEstimator
andTransformerMixin
to select the required attributes of a data frame. This class hasinit
,fit
, andtransform
methods and accepts a list of attribute names. -
Two pipelines,
num pipeline
andcat pipeline
, handle numerical and categorical data, respectively. Thenum pipeline
selects numerical attributes and usesSimpleImputer
to fill missing values with the mean. Thecat pipeline
selects categorical attributes and fills missing values with the most frequent value usingSimpleImputer
. It also appliesLabelEncoder
to encode categorical values into numerical values. -
The
FeatureUnion
function combines numerical and categorical pipelines into a single pipeline nameddata prep pipeline
. -
The
run model
function takes a machine learning model, training, validation, and test data as input. It creates a full pipeline including thedata prep pipeline
and the given model. After fitting the pipeline on the training data, it returns the accuracy, AUC score, and other performance metrics of the model on the training, validation, and test data. The function also records the fit time of the model and adds all performance metrics to a pandas data frame namedexpLog
.
I have utilized a variety of algorithms to power its functionality. The decision algorithms employed include:
- Tree Classifier
- RandomForest Classifier
- Adaboost Classifier
- GradientBoost Classifier
- Cat-BoostClassifier
- XGBM Classifier
- KNN Classifier
- Gaussian Naive Bayes
- Multi-Layer Perceptron Neural Networks
For the Flask application files used:
- Model.py: Responsible for running user moves against the model and providing predictions.
- flask app.py: Handles user moves and provides results.
- chess engine.py: Provides the GUI for the chess interface.
- board test.py: Used for testing and validating user chess moves.
- CREATE AN .env file
python3 -m venv <virtual-environment-name>
- ADD THE FOLLOWING LINES IN IT
FLASK_APP=flask_app.py
FLASK_DEBUG=1
FLASK_RUN_PORT = 3000
APP_SECRET_KEY = IOAJODJAD89ADYU9A78YGD
- THEN RUN THE FOLLOWING IN THE TERMINAL WITHIN THE ENVIRONMENT
pip3 install -r requirements.txt
- TO RUN THE SERVER USE
flask run
OR
python3 flask_app.py
- GENERATE API KEY FOR GEMINI AI
To integrate Gemini AI, you need to obtain an API key from the Gemini AI platform. Insert this API key in your flask_app.py
file to authenticate requests to the Gemini AI API.