Kaggle Competitions

Overview

This repository acts as a central point for all the Kaggle competitions I compete it. There is a template branch I easily clone that contains my preferred file structure for data science projects. They are broadly split into two groups: ones with monetary rewards and the playground series (no monetary reward).

Competition Summary & Links

Tabular Playground Series

S2E6 - Imputation (132/884)
- Missing value prediction on a synthetic dataset of 1 million samples and 81 features
- Leveraged EDA to reduce training time and apply feature engineering
- Designed a multi-head Multilayer Perceptron with skip connections and mish activator
- Things I learned:
  - Many different techniques to impute missing data
  - How imperative a thorough EDA is
  - How different MLP architecture can reduce training time while maintaining the same accuracy
S2E7 - Clustering (42/1253)
- Predict clusters on a 98000 sample 29 feature synthetic dataset
- Due to the lack of ground truth data and evaluation metric (Adjusted Rand Score), a brute force method was used instead of the usual cross validation
- 2 stage approach to predictions:
  - Use a clustering model to predict clusters
  - Train classifiers on the high confident predictions from the clustering model and ensemble
- Things I learned:
  - Some functionality of the SK-Lego library, this was necessary for the score I achieved
  - How to implement the pseudo-labelling technique
  - How to make a custom soft voting ensemble
S2E8 - Regularization (519/1888)
- Predict failure on a synthetic dataset of 26500 samples 26 features which mimicked a real-world product test
- Tricky competition with a lot of missing values and some features were strongly correlated with the target but not produce great results
- Final model was an untuned Logistic regressor with a few engineered features
- Things I learned:
  - The importance of creating a robust cross validation method, tricky for this one because the test set had categorical data and groups that was not in the training data
  - Missing data can be engineered into features
S3E10 - Nonlinearity (6/807)

Monetary Rewards

RSNA - Breast Cancer (962/1687)
- 314 Gigabyte dataset of ~55000 images from ~12000 patients in the form of DICOM files, with an accompanying auxiliary csv
Women in Data Science
BirdCLEF 2023
IceCube - Neutrinos in Deep Ice
Google - Isolated Sign Language Recognition

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
Parkinson-gait @ bb6153d		Parkinson-gait @ bb6153d
birds @ 060fca7		birds @ 060fca7
cancer @ 26b7329		cancer @ 26b7329
ice-cube @ f910178		ice-cube @ f910178
playground		playground
sign-language @ f1bc985		sign-language @ f1bc985
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kaggle Competitions

Overview

Table of Contents

Competition Summary & Links

Tabular Playground Series

S2E6 - Imputation (132/884)

S2E7 - Clustering (42/1253)

S2E8 - Regularization (519/1888)

S3E10 - Nonlinearity (6/807)

Monetary Rewards

RSNA - Breast Cancer (962/1687)

Women in Data Science

BirdCLEF 2023

IceCube - Neutrinos in Deep Ice

Google - Isolated Sign Language Recognition

About

Uh oh!

Releases

Packages

Graham-Broughton/Kaggle-Competitions

Folders and files

Latest commit

History

Repository files navigation

Kaggle Competitions

Overview

Table of Contents

Competition Summary & Links

Tabular Playground Series

S2E6 - Imputation (132/884)

S2E7 - Clustering (42/1253)

S2E8 - Regularization (519/1888)

S3E10 - Nonlinearity (6/807)

Monetary Rewards

RSNA - Breast Cancer (962/1687)

Women in Data Science

BirdCLEF 2023

IceCube - Neutrinos in Deep Ice

Google - Isolated Sign Language Recognition

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages