Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

Lower than expected final test accuracies for some models? #21

Open
mellorjc opened this issue Jul 27, 2020 · 1 comment
Open

Lower than expected final test accuracies for some models? #21

mellorjc opened this issue Jul 27, 2020 · 1 comment

Comments

@mellorjc
Copy link

Hi,

Thank you for your great resource.
I have been starting to use your benchmark but had some queries about the final test accuracies that are returned for some models.

I'm using the nasbench_only108.tfrecord and checked the sha256sum is correct.

A few models I find reported to have final test accuracies of ~10% are as follows (I give the hash returned by nasbench.hash_iterator() )

01bcceabc42489b3af4b4496e333a86e
003d4f8e9c5a066d7b248230d8a4fcb5

However when I train them for a few epochs with a constant learning rate just to see if I expect the test accuracy to be completely random I normally get > 40% validation accuracy, so the fact that the test accuracy is random doesn't seem right to me. (Obviously the test accuracy isn't the validation accuracy and I'm not using the same training procedure but I wouldn't expect such different results between the 2).

I get my test accuracies from the nasbench api as follows

for unique_hash in nasbench.hash_iterator():
        matrix = nasbench.fixed_statistics[unique_hash]['module_adjacency']
        operations = nasbench.fixed_statistics[unique_hash]['module_operations']
        spec = ModelSpec(matrix, operations)
        data = nasbench.query(spec)
        acc = 100.*data['test_accuracy']

Is this correct?

If they are incorrect, is there some systematic cause which would let me know which models I can trust the test accuracies for and which ones I can't?

Apologies if I've misunderstood something.

Thanks again

@wendli01
Copy link

wendli01 commented Jan 22, 2021

As far as I understand it, this happens sometimes with tensorflow when the learning rate is borderline too high (for the selected model).
It can then be a result of the model constantly overshooting local optima because of the high step size and some tensorflow internal safeguards that basically output predictions as a dummy classifier.

Usually this behavior is also quite stochastic, meaning it might only appear in one of the 3 runs.
For this reason, some work simply disregards architectures that achieve <80% final accuracy as noise.

However, it appears as if the models you listed could not be trained in all 3 runs, so across 3 different random inits.
Also the models were trained with cosine lr decay which should prevent this kind of training problem in the first place.

It is probably an issue with the TPU v2 architecture.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants