Skip to content

deepesh1singh/ML-Cryptanalysis

Repository files navigation

Machine Learning in Cryptanalysis

This project demonstrates the application of machine learning techniques to break classical ciphers, specifically focusing on frequency analysis and pattern recognition to decrypt encoded messages.

Project Overview

The project implements ML-based attacks on classical ciphers including:

  • Caesar Cipher: Shift cipher with frequency analysis
  • Vigenère Cipher: Polyalphabetic substitution cipher
  • Substitution Cipher: General monoalphabetic substitution

Features

  • Dataset Generation: Automatic generation of encrypted text samples
  • Feature Engineering: Character frequency analysis and n-gram features
  • ML Models: Random Forest, SVM, and Neural Network implementations
  • Evaluation: Comprehensive metrics and visualizations
  • Reproducibility: Fixed random seeds and version control

Tools and Technologies

Core Machine Learning & Data Science

  • NumPy (≥1.21.0) - Numerical computing and array operations
  • Pandas (≥1.3.0) - Data manipulation and analysis
  • Scikit-learn (≥1.0.0) - Machine learning algorithms and utilities
  • SciPy (≥1.7.0) - Scientific computing

Deep Learning (Optional)

  • TensorFlow (≥2.8.0) - Deep learning framework
  • PyTorch (≥1.10.0) - Deep learning framework
  • Keras (≥2.8.0) - High-level neural network API

Visualization

  • Matplotlib (≥3.5.0) - Plotting and visualization
  • Seaborn (≥0.11.0) - Statistical data visualization
  • Plotly (≥5.0.0) - Interactive plotting

Development & Analysis

  • Jupyter (≥1.0.0) - Interactive notebooks
  • IPyKernel (≥6.0.0) - Jupyter kernel
  • Notebook (≥6.4.0) - Web-based notebook interface

Data Processing & NLP

  • NLTK (≥3.6.0) - Natural language processing
  • Textstat (≥0.7.0) - Text statistics
  • Langdetect (≥1.0.9) - Language detection

Configuration & Utilities

  • PyYAML (≥6.0) - YAML configuration files
  • Python-dotenv (≥0.19.0) - Environment variable management
  • TQDM (≥4.62.0) - Progress bars

Testing & Code Quality

  • Pytest (≥6.2.0) - Testing framework
  • Black (≥21.0.0) - Code formatting
  • Flake8 (≥3.9.0) - Code linting

Model Persistence

  • Joblib (≥1.1.0) - Model serialization
  • Pickle5 (≥0.0.11) - Python object serialization

Key ML Algorithms

  • Random Forest Classifier - For cipher classification
  • Support Vector Machine (SVM) - With RBF and linear kernels
  • Neural Networks (MLPClassifier) - Multi-layer perceptron
  • GridSearchCV - Hyperparameter tuning
  • Cross-validation - Model evaluation

Cryptanalysis Techniques

  • Character Frequency Analysis - English letter frequency patterns
  • N-gram Analysis - Bigram and trigram frequency features
  • Statistical Features - Text statistics and entropy
  • Frequency Deviation - Deviation from expected English frequencies

Directory Structure

ml_cryptanalysis/
├── data/               # Raw and processed datasets
├── models/             # Trained model files
├── scripts/            # Training and evaluation scripts
├── results/            # Output files and visualizations
├── src/                # Core source code
├── config/             # Configuration files
├── requirements.txt    # Python dependencies
└── README.md          # This file

Setup Instructions

1. Clone the Repository

git clone <repository-url>
cd ml_cryptanalysis

2. Create Virtual Environment

python -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix/MacOS:
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Download Data (if needed)

python scripts/generate_data.py

Usage

Training Models

python scripts/train_model.py --cipher caesar --model random_forest

Evaluating Models

python scripts/evaluate_model.py --model_path models/caesar_rf_model.pkl

Interactive Analysis

jupyter notebook notebooks/cryptanalysis_analysis.ipynb

Project Components

1. Data Generation (src/data_generator.py)

  • Generates encrypted text samples using classical ciphers
  • Creates balanced datasets for training
  • Supports multiple cipher types and key lengths

2. Feature Engineering (src/feature_extractor.py)

  • Character frequency analysis
  • N-gram feature extraction
  • Statistical pattern recognition

3. Model Training (scripts/train_model.py)

  • Trains ML models on encrypted text
  • Supports multiple algorithms (Random Forest, SVM, Neural Networks)
  • Cross-validation and hyperparameter tuning

4. Evaluation (scripts/evaluate_model.py)

  • Model performance assessment
  • Confusion matrix and classification reports
  • Visualization of results

Results

The project achieves:

  • Caesar Cipher: ~95% accuracy with frequency analysis
  • Vigenère Cipher: ~85% accuracy with n-gram features
  • Substitution Cipher: ~70% accuracy with advanced features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Upgraded features added

  • XGBoost + Optuna hyperparameter tuning in scripts/train_model.py
  • Starter character-level Transformer training script in scripts/train_transformer.py
  • Advanced evaluator scripts/scripts/evaluate_model_advanced.py that saves confusion matrix and report
  • CLI placeholder cryptanalysis.py and Streamlit demo app_streamlit.py
  • Dockerfile, requirements-advanced.txt, GitHub Actions CI, and notebooks placeholders

About

Machine Learning in Cryptanalysis - ML-based attacks on classical ciphers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors