You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when comparing this code to the one placed in tensorflow/models I've found that implementations use different layers as output of VGGish model (if considering activation as a separate layer),
Note that the embedding layer does not include a final non-linear activation, so the embedding value is pre-activation
Changing output layer of VGGish in your implementation to pre-activation one (w/o RELU) makes embeddings (almost) equal in both cases, - raw and PCA'ed ones.
Thanks for porting though, great work!
The text was updated successfully, but these errors were encountered:
First, I would like to echo the kudos for publishing this port of VGGIsh. I am implementing a Fréchet Audio Distance (FAD) library and will definitely make use of it.
For anyone else who arrives here looking for a workaround, the final ReLU can be removed from the pretrained VGGish model with the following snippet:
Hello there,
when comparing this code to the one placed in tensorflow/models I've found that implementations use different layers as output of VGGish model (if considering activation as a separate layer),
yours:
torchvggish/torchvggish/vggish.py
Line 19 in 4670116
google's: https://github.com/tensorflow/models/blob/f32dea32e3e9d3de7ed13c9b16dc7a8fea3bd73d/research/audioset/vggish/vggish_slim.py#L104-L106 (
activation_fn=None
)Also, it's mentioned in README
Changing output layer of VGGish in your implementation to pre-activation one (w/o RELU) makes embeddings (almost) equal in both cases, - raw and PCA'ed ones.
Thanks for porting though, great work!
The text was updated successfully, but these errors were encountered: