Would you please release the hyper-parameters for FreeLB based on ALBERT(hugging-face) #9

FFYYang · 2020-05-13T13:14:17Z

There are only 4 tasks' hyper parameters in this file, would you please release others?

PantherYan · 2020-05-14T03:48:51Z

There are only 4 tasks' hyper parameters in this file, would you please release others?

Do you have any comments for the scale of norm( $\epsilon$-ball )?
Is it directly to the adversarial effect or general capability?

FFYYang · 2020-05-15T02:13:57Z

@PantherYan
In my opinion, the selection of epsilon is tricky and depend on your task's dataset, large epsilon may lead the generated adversarial example to change the golden label, while the small epsilon cannot threaten the model.
I was curious:

Is there any guide rule to select a proper epsilon?
How to make sure the adversarial perturbation won't change the sentence's semantic and the golden label?

PantherYan · 2020-05-15T02:27:00Z

@YasinQiu
Thanks for your reply.

#1. Leave this question to @zhuchen03
#2. In my option, the adversarial perturbation similar to the denoise autoencoder, which adding noise to robust or adds the general capability to language model.

I will read more literature to answer our confusion question.
Before yesterday. I training my implemented of freeLB in a plugin format without dropout mask.https://github.com/zhuchen03/FreeLB/issues/8#issuecomment-627669810. It works well with a setting of hypermeter. But after I added the mask_drooout implement and I changed to another set of hyper meter, the FreeLB AT goes to the wrong way. Accuracy falls with training.

It confused me a lot. I will figure out why and post it out.

PantherYan · 2020-05-16T08:04:01Z

@YasinQiu
The \epsilon is small.
There are a lot of papers to explain how to choose the minimum or different scales of \epsilon.
Here are for your reference.

Deepfool: a simple and accurate method to fool deep neural networks
FGSM. different scale of \epsilon attack.
The author was given a reference in the lunch files.

To the explicit value. Should be around 1e-1?

zhuchen03 · 2020-05-17T01:57:40Z

I have added the hyperparameters for 8 of the GLUE tasks in the bash script.

For epsilon, in the current setting, you can set it to 0 first, which will put no restriction on the maximum norm, and tune other hyperparameters. In this way, the maximum norm will be restricted by the ascent step size, number of ascent steps and the initialization.

In the context of security, epsilon restricts the strength of the adversary for better comparisons. However, in our case, you should first observe the norm of the embeddings to choose a strength/epsilon that is not ignorable but also won't outweigh the embeddings.

FFYYang · 2020-05-17T03:01:18Z

@zhuchen03 @PantherYan thx ~!!!

FFYYang closed this as completed May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would you please release the hyper-parameters for FreeLB based on ALBERT(hugging-face) #9

Would you please release the hyper-parameters for FreeLB based on ALBERT(hugging-face) #9

FFYYang commented May 13, 2020

PantherYan commented May 14, 2020

FFYYang commented May 15, 2020

PantherYan commented May 15, 2020

PantherYan commented May 16, 2020 •

edited

Loading

zhuchen03 commented May 17, 2020

FFYYang commented May 17, 2020

Would you please release the hyper-parameters for FreeLB based on ALBERT(hugging-face) #9

Would you please release the hyper-parameters for FreeLB based on ALBERT(hugging-face) #9

Comments

FFYYang commented May 13, 2020

PantherYan commented May 14, 2020

FFYYang commented May 15, 2020

PantherYan commented May 15, 2020

PantherYan commented May 16, 2020 • edited Loading

zhuchen03 commented May 17, 2020

FFYYang commented May 17, 2020

PantherYan commented May 16, 2020 •

edited

Loading