Enhancing Single-Image Super-Resolution using Patch-Mosaic Data Augmentation on Lightweight Bimodal Network

Authors

DOI:

https://doi.org/10.4108/eetinis.v10i2.2774

Keywords:

single-image super-resolution, data augmentation, vision transformer, CNN

Abstract

With the advancement of deep learning, single-image super-resolution (SISR) has made significant strides. However, most current SISR methods are challenging to employ in real-world applications because they are doubtlessly employed by substantial computational and memory costs caused by complex operations. Furthermore, an efficient dataset is a key factor for bettering model training. The hybrid models of CNN and Vision Transformer can be more efficient in the SISR task. Nevertheless, they require substantial or extremely high-quality datasets for training that could be unavailable from time to time. To tackle these issues, a solution combined by applying a Lightweight Bimodal Network (LBNet) and Patch-Mosaic data augmentation method which is the enhancement of CutMix and YOCO is proposed in this research. With patch-oriented Mosaic data augmentation, an efficient Symmetric CNN is utilized for local feature extraction and coarse image restoration. Plus, a Recursive Transformer aids in fully grasping the long-term dependence of images, enabling the global information to be fully used to refine texture details. Extensive experiments have shown that LBNet with the proposed data augmentation with zero-free additional parameters method outperforms the original LBNet and other state-of-the-art techniques in which image-level data augmentation is applied.

Downloads

Download data is not yet available.

References

Dong C, Loy CC, He K, Tang X. Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. Springer; 2014. p. 184-99.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770-8.

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 4700-8.

Kim J, Lee JK, Lee KM. Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 1646-54.

Lim B, Son S, Kim H, Nah S, Mu Lee K. Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops; 2017. p. 136-44.

Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y. Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 286-301.

Kim J, Lee JK, Lee KM. Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 1637-45.

Tai Y, Yang J, Liu X. Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 3147-55.

Ahn N, Kang B, Sohn KA. Fast, accurate, and lightweight super-resolution with cascading residual network. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 252-68.

Gao G, Li W, Li J, Wu F, Lu H, Yu Y. Feature distillation interaction weighting network for lightweight image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36; 2022. p. 661- 9.

Zhang D, Li C, Xie N, Wang G, Shao J. PFFN: Progressive Feature Fusion Network for Lightweight Image Super-Resolution. In: Proceedings of the 29th ACM International Conference on Multimedia; 2021. p. 3682-90.

Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Toronto, Ontario: University of Toronto; 2009. 0.

Kervrann C, Boulanger J. Optimal spatial adaptation for patch-based image denoising. IEEE Transactions on Image Processing. 2006;15(10):2866-78.

Sivic J, Zisserman A. Video Google: A text retrieval approach to object matching in videos. In: Computer Vision, IEEE International Conference on. vol. 3. IEEE Computer Society; 2003. p. 1470-0.

Csurka G, Dance C, Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV. vol. 1. Prague; 2004. p. 1-2.

Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). vol. 2. IEEE; 2006. p. 2169-78.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020.

Bochkovskiy A, Wang CY, Liao HYM. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:200410934. 2020.

Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision; 2019. p. 6023-32.

Gao G, Wang Z, Li J, Li W, Yu Y, Zeng T. Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer. arXiv preprint arXiv:220413286. 2022.

Li J, Pei Z, Zeng T. From beginner to master: A survey for deep learning-based single-image super-resolution. arXiv preprint arXiv:210914335. 2021.

Hui Z, Gao X, Yang Y, Wang X. Lightweight image super-resolution with information multi-distillation network. In: Proceedings of the 27th acm international conference on multimedia; 2019. p. 2024-32.

Lan R, Sun L, Liu Z, Lu H, Pang C, Luo X. MADNet: a fast and lightweight network for single-image super resolution. IEEE transactions on cybernetics. 2020;51(3):1443-53.

Xiao J, Ye Q, Zhao R, Lam KM, Wan K. Self-feature learning: An efficient deep lightweight network for image super-resolution. In: Proceedings of the 29th ACM International Conference on Multimedia; 2021. p. 4408-16.

Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, et al. Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition;. p. 12299-310.

Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R. Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021. p. 1833-44.

Lu Z, Li J, Liu H, Huang C, Zhang L, Zeng T. Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 457-66.

Efros AA, Leung TK. Texture synthesis by non-parametric sampling. In: Proceedings of the seventh IEEE international conference on computer vision. vol. 2. IEEE; 1999. p. 1033-8.

Shocher A, Cohen N, Irani M. “zero-shot” super-resolution using deep internal learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 3118-26.

Park T, Efros AA, Zhang R, Zhu JY. Contrastive learning for unpaired image-to-image translation. In: European conference on computer vision. Springer; 2020. p. 319-45.

Han J, Shoeiby M, Petersson L, Armin MA. Dual contrastive learning for unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 746- 55.

Brendel W, Bethge M. Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:190400760. 2019.

Gontijo Lopes R, Yin D, Poole B, Gilmer J, Cubuk ED. Improving robustness without sacrificing accuracy with Patch Gaussian augmentation. arXiv e-prints. 2019:arXiv-1906.

Dwibedi D, Misra I, Hebert M. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE international conference on computer vision; 2017. p. 1301-10.

Georgakis G, Mousavian A, Berg AC, Kosecka J. Synthesizing training data for object detection in indoor scenes. arXiv preprint arXiv:170207836. 2017.

Lin S, Yu T, Feng R, Li X, Jin X, Chen Z. Local patch autoaugment with multi-agent collaboration. arXiv preprint arXiv:210311099. 2021.

Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:180509501. 2018.

Qin Y, Zhang C, Chen T, Lakshminarayanan B, Beutel A, Wang X. Understanding and improving robustness of vision transformers through patch-based negative augmentation. arXiv preprint arXiv:211007858. 2021.

Han J, Fang P, Li W, Hong J, Armin MA, Reid I, et al. You Only Cut Once: Boosting Data Augmentation with a Single Cut. arXiv preprint arXiv:220112078. 2022.

Zhao H, Gallo O, Frosio I, Kautz J. Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging. 2016;3(1):47-57.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.

Bevilacqua M, Roumy A, Guillemot C, Alberi Morel ML. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In: British Machine Vision Conference (BMVC). Guildford, Surrey, United Kingdom; 2012. Available from: https://hal. inria.fr/hal-00747054.

Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations. In: International conference on curves and surfaces. Springer; 2010. p. 711- 30.

Martin D, Fowlkes C, Tal D, Malik J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. vol. 2. IEEE; 2001. p. 416-23.

Huang JB, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 5197-206.

Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T, et al. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications. 2017;76(20):21811-38.

Timofte R, Agustsson E, Van Gool L, Yang MH, Zhang L. Ntire 2017 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops; 2017. p. 114-25.

Zhang X, Gao P, Liu S, Zhao K, Li G, Yin L, et al. Accurate and efficient image super-resolution via global-local adjusting dense network. IEEE Transactions on Multimedia. 2020;23:1924-37.

Downloads

Published

25-05-2023

How to Cite

Nguyen, Q. T., & Quang Hieu, T. (2023). Enhancing Single-Image Super-Resolution using Patch-Mosaic Data Augmentation on Lightweight Bimodal Network. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 10(2), e1. https://doi.org/10.4108/eetinis.v10i2.2774