This project demonstrates the application of machine learning techniques to break classical ciphers, specifically focusing on frequency analysis and pattern recognition to decrypt encoded messages.
The project implements ML-based attacks on classical ciphers including:
- Caesar Cipher: Shift cipher with frequency analysis
- Vigenère Cipher: Polyalphabetic substitution cipher
- Substitution Cipher: General monoalphabetic substitution
- Dataset Generation: Automatic generation of encrypted text samples
- Feature Engineering: Character frequency analysis and n-gram features
- ML Models: Random Forest, SVM, and Neural Network implementations
- Evaluation: Comprehensive metrics and visualizations
- Reproducibility: Fixed random seeds and version control
- NumPy (≥1.21.0) - Numerical computing and array operations
- Pandas (≥1.3.0) - Data manipulation and analysis
- Scikit-learn (≥1.0.0) - Machine learning algorithms and utilities
- SciPy (≥1.7.0) - Scientific computing
- TensorFlow (≥2.8.0) - Deep learning framework
- PyTorch (≥1.10.0) - Deep learning framework
- Keras (≥2.8.0) - High-level neural network API
- Matplotlib (≥3.5.0) - Plotting and visualization
- Seaborn (≥0.11.0) - Statistical data visualization
- Plotly (≥5.0.0) - Interactive plotting
- Jupyter (≥1.0.0) - Interactive notebooks
- IPyKernel (≥6.0.0) - Jupyter kernel
- Notebook (≥6.4.0) - Web-based notebook interface
- NLTK (≥3.6.0) - Natural language processing
- Textstat (≥0.7.0) - Text statistics
- Langdetect (≥1.0.9) - Language detection
- PyYAML (≥6.0) - YAML configuration files
- Python-dotenv (≥0.19.0) - Environment variable management
- TQDM (≥4.62.0) - Progress bars
- Pytest (≥6.2.0) - Testing framework
- Black (≥21.0.0) - Code formatting
- Flake8 (≥3.9.0) - Code linting
- Joblib (≥1.1.0) - Model serialization
- Pickle5 (≥0.0.11) - Python object serialization
- Random Forest Classifier - For cipher classification
- Support Vector Machine (SVM) - With RBF and linear kernels
- Neural Networks (MLPClassifier) - Multi-layer perceptron
- GridSearchCV - Hyperparameter tuning
- Cross-validation - Model evaluation
- Character Frequency Analysis - English letter frequency patterns
- N-gram Analysis - Bigram and trigram frequency features
- Statistical Features - Text statistics and entropy
- Frequency Deviation - Deviation from expected English frequencies
ml_cryptanalysis/
├── data/ # Raw and processed datasets
├── models/ # Trained model files
├── scripts/ # Training and evaluation scripts
├── results/ # Output files and visualizations
├── src/ # Core source code
├── config/ # Configuration files
├── requirements.txt # Python dependencies
└── README.md # This file
git clone <repository-url>
cd ml_cryptanalysispython -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix/MacOS:
source venv/bin/activatepip install -r requirements.txtpython scripts/generate_data.pypython scripts/train_model.py --cipher caesar --model random_forestpython scripts/evaluate_model.py --model_path models/caesar_rf_model.pkljupyter notebook notebooks/cryptanalysis_analysis.ipynb- Generates encrypted text samples using classical ciphers
- Creates balanced datasets for training
- Supports multiple cipher types and key lengths
- Character frequency analysis
- N-gram feature extraction
- Statistical pattern recognition
- Trains ML models on encrypted text
- Supports multiple algorithms (Random Forest, SVM, Neural Networks)
- Cross-validation and hyperparameter tuning
- Model performance assessment
- Confusion matrix and classification reports
- Visualization of results
The project achieves:
- Caesar Cipher: ~95% accuracy with frequency analysis
- Vigenère Cipher: ~85% accuracy with n-gram features
- Substitution Cipher: ~70% accuracy with advanced features
This project is licensed under the MIT License - see the LICENSE file for details.
- XGBoost + Optuna hyperparameter tuning in scripts/train_model.py
- Starter character-level Transformer training script in scripts/train_transformer.py
- Advanced evaluator scripts/scripts/evaluate_model_advanced.py that saves confusion matrix and report
- CLI placeholder cryptanalysis.py and Streamlit demo app_streamlit.py
- Dockerfile, requirements-advanced.txt, GitHub Actions CI, and notebooks placeholders