Skip to content

Graham-Broughton/Kaggle-Competitions

Repository files navigation

Kaggle Competitions

Overview

This repository acts as a central point for all the Kaggle competitions I compete it. There is a template branch I easily clone that contains my preferred file structure for data science projects. They are broadly split into two groups: ones with monetary rewards and the playground series (no monetary reward).

Table of Contents

Playground:

Monetary:

Competition Summary & Links

Tabular Playground Series

    • Missing value prediction on a synthetic dataset of 1 million samples and 81 features
    • Leveraged EDA to reduce training time and apply feature engineering
    • Designed a multi-head Multilayer Perceptron with skip connections and mish activator
    • Things I learned:
      • Many different techniques to impute missing data
      • How imperative a thorough EDA is
      • How different MLP architecture can reduce training time while maintaining the same accuracy
    • Predict clusters on a 98000 sample 29 feature synthetic dataset
    • Due to the lack of ground truth data and evaluation metric (Adjusted Rand Score), a brute force method was used instead of the usual cross validation
    • 2 stage approach to predictions:
      • Use a clustering model to predict clusters
      • Train classifiers on the high confident predictions from the clustering model and ensemble
    • Things I learned:
      • Some functionality of the SK-Lego library, this was necessary for the score I achieved
      • How to implement the pseudo-labelling technique
      • How to make a custom soft voting ensemble
    • Predict failure on a synthetic dataset of 26500 samples 26 features which mimicked a real-world product test
    • Tricky competition with a lot of missing values and some features were strongly correlated with the target but not produce great results
    • Final model was an untuned Logistic regressor with a few engineered features
    • Things I learned:
      • The importance of creating a robust cross validation method, tricky for this one because the test set had categorical data and groups that was not in the training data
      • Missing data can be engineered into features

Monetary Rewards

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published