Simulate gene expression data & train a classifier
This project demonstrates how machine learning can be applied to small gene expression datasets to classify samples as "Healthy" or "Disease".
- Load synthetic gene expression dataset
- Split into training and testing sets
- Standardize features
- Train a Random Forest classifier
- Evaluate model performance (accuracy + classification report)
- Python
- pandas
- scikit-learn
- Machine learning fundamentals
- Feature scaling
- Classification model building
- Model evaluation
- Bioinformatics-relevant ML workflow