This project implements the K-Nearest Neighbors (KNN) algorithm for gender classification based on height, weight, and age. Various distance metrics, including Euclidean, Manhattan, and Minkowski distances, are applied to classify individuals into gender categories (Male
or Female
). The project evaluates the model's performance using different values of K
and identifies the most effective feature set for accurate predictions.
- Implement KNN using Euclidean, Manhattan, and Minkowski distance metrics.
- Evaluate the impact of different
K
values on classification accuracy. - Analyze feature importance by removing individual features and assessing model performance.
- Use cross-validation to evaluate model robustness.
- Programming Language: Python
- Libraries:
numpy
: For numerical computations.math
: For mathematical operations.
- Training Data (
Training_Data.txt
andTraining_Data.csv
):- Contains height, weight, age, and gender labels (
M
for Male,W
for Female). - Example:
(( 1.6530190426733, 72.871146648479, 24), W) (( 1.6471384909498, 72.612785314988, 34), W)
- Contains height, weight, age, and gender labels (
- Test Data (
Test_Data.txt
andTest_Data.csv
):- Contains height, weight, and age without labels for prediction.
- Example:
(1.62065758929, 59.376557437583, 32)
- Implemented KNN using the following distance metrics:
- Euclidean Distance: [ \text{distance} = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} ]
- Manhattan Distance: [ \text{distance} = \sum_{i=1}^{n} |x_i - y_i| ]
- Minkowski Distance: [ \text{distance} = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{1/p} ]
- Tested the model with various values of
K
(e.g., 1, 3, 5, 7). - Observed classification accuracy for each
K
using cross-validation.
- Removed features (e.g., age) to evaluate their impact on model performance.
- Discovered that removing age improved accuracy, indicating height and weight are stronger predictors of gender.
Distance Metric | K=1 | K=3 | K=5 | K=7 | K=9 |
---|---|---|---|---|---|
Euclidean | 100% | 95% | 92% | 90% | 88% |
Manhattan | 98% | 96% | 93% | 91% | 89% |
Minkowski (p=3) | 99% | 96% | 93% | 90% | 89% |
- Removing
Age
improved model accuracy across all metrics andK
values. - Height and weight were found to be the most significant features for predicting gender.