ALAN: Self-Attention Is Not All You Need for Image Super-Resolution

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

ALAN: Self-Attention Is Not All You Need for Image Super-Resolution

By: 
Qiangpu Chen; Jinghui Qin; Wushao Wen

Vision Transformer (ViT)-based image super-resolution (SR) methods have achieved impressive performance and surpassed CNN-based SR methods by utilizing Multi-Head Self-Attention (MHSA) to model long-range dependencies. However, the quadratic complexity of MHSA and the inefficiency of non-parallelized window partition seriously affect the inference speed, hindering these SR methods from being applied to application scenarios requiring speed and quality. To address this issue, we propose an Asymmetric Large-kernel Attention Network (ALAN) utilizing a stage-to-block design paradigm inspired by ViT. In the ALAN, the core block named Asymmetric Large Kernel Convolution Block (ALKCB) adopts a similar structure to the Swin Transformer Layer but replaces the MHSA with our proposed Asymmetric Depth-Wise Convolution Attention (ADWCA) to enhance both the SR quality and inference speed. The proposed ADWCA, with linear complexity, uses large kernel depth-wise dilation convolution and Hadamard product as the attention map. The structural re-parameterization technique to strengthen the kernel skeletons with asymmetric convolution is also explored. Experimental results demonstrate that ALAN achieves state-of-the-art performance with faster inference speed than ViT-based models and smaller parameters than CNN-based models. Specifically, the tiny size of ALAN (ALAN-T) is 3× smaller than ShuffleMixer with similar performance, and ALAN is 4× faster than SwinIR-S with 0.1 dB gain in PSNR.

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel