This projects helps classifying the pet and breed category provided the following features:
- pet_id: Unique Pet Id
- issue_date: Date on which the pet was issued to the shelter
- listing_date: Date when the pet arrived at the shelter
- condition: Condition of the pet
- color_type: Color of the pet
- length(m): Length of the pet (in meter)
- height(cm): Height of the pet (in centimeter)
- X1,X2: Anonymous columns
- breed_category: Breed category of the pet (target variable)
- pet_category: Category of the pet (target variable)
- s1: weighted average f1_score for breed category
- s2: weighted average f1_score for pet category
- score = 100*(s1+s2)/2
- train.csv and test.csv files contain the training and testing data, respectively
- PetAdoption.ipynb file gives the walkthrough over the complete project.
-
Converted issue date and listing date to datetime and found the difference between the dates
-
Used a OneHotEncoder on the color_type column since it contained categorical features
-
Converted length in m to cm by mutiplying the column by 100
-
Replaced all the length values of 0 by the mean of the length column
-
Replaced the NaN values in condition column by "3" and converted the column to int data type
-
Dropped "issue_date", "listing_date","color_type" column
-
Used Standard Scaler on X1, X2, length and height
-
Removed the colour_type feature "Black Tiger" and "Brown Tiger" from training data since it wasn't present in test data
-
Created a Heatmap to see the correlation amongst all the features
-
Finally used LGBM classifier to train the data and create the predictions and got a F1 score of 90.50%
This dataset was from HackerEarth competetion and hence credit for the dataset used for training goes to https://www.hackerearth.com/challenges/competitive/hackerearth-machine-learning-challenge-pet-adoption/problems/