This respository contains my code for competition in kaggle.
7th Place Solution for Jigsaw Unintended Bias in Toxicity Classification
Team: Abhishek Thakur, Duy, R0seNb1att, atfujita
All models(Team)
Public LB: 0.94729(3rd)
Private LB: 0.94660(7th)
My models(5 Model averaging)
Public LB: 0.94719
Private LB: 0.94651
Thanks to Abhishek and Duy's wonderful models and support, I was able to get better results.
- Particularly important libraries are listed in requirements.txt
I created 5 models
-
LSTM
- Based on the Quora competition model
- Architecture: LSTM + GRU + Self Attention + Max pooling
- Word embeddings: concat glove and fasttext.
- Optimizer: AdamW
- Train:
- max_len = 220
- n_splits = 10
- batch_size = 512
- train_epochs = 7
- base_lr, max_lr = 0.0005, 0.003
- Weight Decay = 0.0001
- Learning schedule: CyclicLR
-
BERT
- The model is based on yuval reina's graet kernel
- Changes are loss function and preprocessing.
- I created 4 BERT models.
- BERT-Base Uncased
- BERT-Base Cased
- BERT-Large Uncased(Whole Word Masking)
- BERT-Large Cased(Whole Word Masking)
- Train:
- max_len = 220
- train samples = 1.7M, val samples= 0.1M
- batch_size = 32(Base), 4(Large)
- accumulation_steps = 1(Base), 16(Large)
- train_epochs = 2
- lr = 2e-5
The loss function was very important in this competition.
In fact, all winners used different loss functions.
My loss function is below.
y_columns = ['target']
y_aux_train = train_df[['target', 'severe_toxicity', 'obscene',
'identity_attack', 'insult',
'threat',
'sexual_explicit'
]]
y_aux_train = y_aux_train.fillna(0)
identity_columns = [
'male', 'female', 'homosexual_gay_or_lesbian', 'christian', 'jewish',
'muslim', 'black', 'white', 'psychiatric_or_mental_illness']
# Overall
weights = np.ones((len(train_df),)) / 4
# Subgroup
weights += (train_df[identity_columns].fillna(0).values >= 0.5).sum(
axis=1).astype(bool).astype(np.int) / 4
# Background Positive, Subgroup Negative
weights += (((train_df['target'].values >= 0.5).astype(bool).astype(np.int) +
(train_df[identity_columns].fillna(0).values < 0.5).sum(
axis=1).astype(bool).astype(np.int)) > 1).astype(
bool).astype(np.int) / 4
# Background Negative, Subgroup Positive
weights += (((train_df['target'].values < 0.5).astype(bool).astype(np.int) +
(train_df[identity_columns].fillna(0).values >= 0.5).sum(
axis=1).astype(bool).astype(np.int)) > 1).astype(
bool).astype(np.int) / 4
y_train = np.vstack(
[(train_df['target'].values >= 0.5).astype(np.int), weights]).T
y_train = np.hstack([y_train, y_aux_train])
def custom_loss(data, targets):
''' Define custom loss function for weighted BCE on 'target' column '''
bce_loss_1 = nn.BCEWithLogitsLoss(
weight=targets[:, 1:2])(data[:, :1], targets[:, :1])
bce_loss_2 = nn.BCEWithLogitsLoss()(data[:, 1:2], targets[:, 2:3])
bce_loss_3 = nn.BCEWithLogitsLoss()(data[:, 2:3], targets[:, 3:4])
bce_loss_4 = nn.BCEWithLogitsLoss()(data[:, 3:4], targets[:, 4:5])
bce_loss_5 = nn.BCEWithLogitsLoss()(data[:, 4:5], targets[:, 5:6])
bce_loss_6 = nn.BCEWithLogitsLoss()(data[:, 5:6], targets[:, 6:7])
bce_loss_7 = nn.BCEWithLogitsLoss()(data[:, 6:7], targets[:, 7:8])
bce_loss_8 = nn.BCEWithLogitsLoss()(data[:, 7:8], targets[:, 8:9])
return bce_loss_1 + bce_loss_2 + bce_loss_3 + bce_loss_4 \
+ bce_loss_5 + bce_loss_6 + bce_loss_7 + bce_loss_8