This paper proposes an input-adaptive convolution methodology that is mathematically the same as mixture-of-experts but is much more efficient. The proposed CondConv can replace convolution layers, and shows quite significant performance improvement with small amount of additional computation (even shows small performance improvement with less computation).
- Simple and effective method that can work as a drop-in replacement of original convolutions
- State-of-the-art performance and thorough ablation studies
- Pre-defined number of experts (although this issue is not at all critical)
- Follow-up works show that the performance and efficiency can be further improved...
The method is very simple and can be summarized as below:
$ Output(x) = \sigma((\alpha_1 \cdot W_1 + ... + \alpha_n \cdot W_n) * x) $
where each
where
ImageNet classification results:
Ablation studies on the effect of routing architectures and how many convolution layers to replace with CondConv:
Visualizations of what kind of semantically meaningful features each expert focuses on:
--
Aug. 24, 2020 Note by Myungsub