Lower than expected final test accuracies for some models? #21

mellorjc · 2020-07-27T13:39:51Z

Hi,

Thank you for your great resource.
I have been starting to use your benchmark but had some queries about the final test accuracies that are returned for some models.

I'm using the nasbench_only108.tfrecord and checked the sha256sum is correct.

A few models I find reported to have final test accuracies of ~10% are as follows (I give the hash returned by nasbench.hash_iterator() )

01bcceabc42489b3af4b4496e333a86e
003d4f8e9c5a066d7b248230d8a4fcb5

However when I train them for a few epochs with a constant learning rate just to see if I expect the test accuracy to be completely random I normally get > 40% validation accuracy, so the fact that the test accuracy is random doesn't seem right to me. (Obviously the test accuracy isn't the validation accuracy and I'm not using the same training procedure but I wouldn't expect such different results between the 2).

I get my test accuracies from the nasbench api as follows

for unique_hash in nasbench.hash_iterator():
        matrix = nasbench.fixed_statistics[unique_hash]['module_adjacency']
        operations = nasbench.fixed_statistics[unique_hash]['module_operations']
        spec = ModelSpec(matrix, operations)
        data = nasbench.query(spec)
        acc = 100.*data['test_accuracy']

Is this correct?

If they are incorrect, is there some systematic cause which would let me know which models I can trust the test accuracies for and which ones I can't?

Apologies if I've misunderstood something.

Thanks again

The text was updated successfully, but these errors were encountered:

wendli01 · 2021-01-22T21:02:56Z

As far as I understand it, this happens sometimes with tensorflow when the learning rate is borderline too high (for the selected model).
It can then be a result of the model constantly overshooting local optima because of the high step size and some tensorflow internal safeguards that basically output predictions as a dummy classifier.

Usually this behavior is also quite stochastic, meaning it might only appear in one of the 3 runs.
For this reason, some work simply disregards architectures that achieve <80% final accuracy as noise.

However, it appears as if the models you listed could not be trained in all 3 runs, so across 3 different random inits.
Also the models were trained with cosine lr decay which should prevent this kind of training problem in the first place.

It is probably an issue with the TPU v2 architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower than expected final test accuracies for some models? #21

Lower than expected final test accuracies for some models? #21

mellorjc commented Jul 27, 2020

wendli01 commented Jan 22, 2021 •

edited

Loading

Lower than expected final test accuracies for some models? #21

Lower than expected final test accuracies for some models? #21

Comments

mellorjc commented Jul 27, 2020

wendli01 commented Jan 22, 2021 • edited Loading

wendli01 commented Jan 22, 2021 •

edited

Loading