Transformer Based Ship Detector: An Improvement on Feature Map and Tiny Training Set

Duc-Dat Ngo; Van-Linh Vo; My-Ha Le; Hoc-Phan; Manh Hung Nguyen

doi:10.4108/eetinis.v12i1.6794

Authors

Duc-Dat Ngo Ho Chi Minh City University of Technology and Education
Van-Linh Vo Ho Chi Minh City University of Technology and Education
My-Ha Le Ho Chi Minh City University of Technology and Education
Hoc-Phan Ho Chi Minh City University of Technology and Education
Manh Hung Nguyen Ho Chi Minh City University of Technology and Education

DOI:

https://doi.org/10.4108/eetinis.v12i1.6794

Keywords:

Object Detection, Maritime border security, Deformable DETR

Abstract

The exponential increment of commodity exchange has raised the need for maritime border security in recent years. One of the most critical tasks for naval border security is ship detection inside and outside the territorial sea. Conventionally, the task requires a substantial human workload. Fortunately, with the rapid growth of the digital camera and deep-learning technique, computer programs can handle object detection tasks well enough to replace human labor. Therefore, this paper studies how to apply recent state-of-the-art deep-learning networks to the ship detection task. We found that with a suitable number of object queries, the Deformable-DETR method will improve the performance compared to the state-of-the-art ship detector. Moreover, comprehensive experiments on different scale datasets prove that the technique can significantly improve the results when the training sample is limited. Last but not least, feature maps given by the method will focus well on key objects in the image.

Downloads

Mentions

News Mentions: 1

-

see details

References

[1] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2013. DOI: https://doi.org/10.1109/CVPR.2014.81

[2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031

[3] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shotMultiBox detector,” in Computer Vision – ECCV 2016, pp. 21–37, Springer International Publishing, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_2

[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (Las Vegas Blvd), pp. 779–788, 2016. DOI: https://doi.org/10.1109/CVPR.2016.91

[5] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” 2016. DOI: https://doi.org/10.1109/CVPR.2017.106

[6] Q. Li, D. Xiao, and F. Shi, “A decoupled head and coordinate attention detection method for ship targets in sar images,” IEEE Access, vol. 10, pp. 128562–128578, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3222364

[7] J. Zheng and Y. Liu, “A study on small-scale ship detection based on attention mechanism,” IEEE Access, vol. 10, pp. 77940–77949, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3193669

[8] H. Li, L. Deng, C. Yang, J. Liu, and Z. Gu, “Enhanced yolo v3 tiny network for real-time ship detection from visual image,” IEEE Access, vol. 9, pp. 16692–16706, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3053956

[9] H. Cui, Y. Yang, M. Liu, T. Shi, and Q. Qi, “Ship detection: An improved yolov3 method,” in OCEANS 2019 - Marseille, pp. 1–4, 2019. DOI: https://doi.org/10.1109/OCEANSE.2019.8867209

[10] T. Liu, B. Pang, L. Zhang, W. Yang, and X. Sun, “Sea surface object detection algorithm based on yolo v4 fused with reverse depthwise separable convolution (rdsc) for usv,” Journal of Marine Science and Engineering, vol. 9, no. 7, 2021. DOI: https://doi.org/10.3390/jmse9070753

[11] M. Zhang, X. Rong, and X. Yu, “Light-sdnet: A lightweight cnn architecture for ship detection,” IEEE Access, vol. 10, pp. 86647–86662, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3199352

[12] T. Liu, B. Pang, S. Ai, and X. Sun, “Study on visual detection algorithm of sea surface targets based on improved yolov3,” Sensors, vol. 20, no. 24, 2020. DOI: https://doi.org/10.3390/s20247263

[13] Q. Zhang, Y. Huang, and R. Song, “A ship detection model based on yolox with lightweight adaptive channel feature fusion and sparse data augmentation,” in 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8, 2022. DOI: https://doi.org/10.1109/AVSS56176.2022.9959441

[14] D.-D. Ngo, V.-L. Vo, T. Nguyen, M.-H. Nguyen, and M.-H. Le, “Image-based ship detection using deep variational information bottleneck,” Sensors, vol. 23, no. 19, 2023. DOI: https://doi.org/10.3390/s23198093

[15] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, End-to-End Object Detection with Transformers, pp. 213–229. 11 2020. DOI: https://doi.org/10.1007/978-3-030-58452-8_13

[16] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv, 2020.

[17] Y. Zhang, M. J. Er,W. Gao, and J.Wu, “High performance ship detection via transformer and feature distillation,” in 2022 5th International Conference on Intelligent Autonomous Systems (ICoIAS), pp. 31–36, 2022. DOI: https://doi.org/10.1109/ICoIAS56028.2022.9931223

[18] Z. Shao, W. Wu, Z. Wang, W. Du, and C. Li, “Seaships: A large-scale precisely annotated dataset for ship detection,” IEEE Transactions on Multimedia, vol. 20, no. 10, pp. 2593–2604, 2018. DOI: https://doi.org/10.1109/TMM.2018.2865686

[19] B. Ye, T. Qin, H. Zhou, J. Lai, and X. Xie, “Crosslevel attention and ratio consistency network for ship detection,” in 2022 26th International Conference on Pattern Recognition (ICPR), pp. 4644–4650, 2022. DOI: https://doi.org/10.1109/ICPR56361.2022.9956320

[20] R. Girshick, “Fast r-cnn,” in 2015 IEEE International Conference on Computer Vision (ICCV), (Santiago, Chile), pp. 1440–1448, 2015. DOI: https://doi.org/10.1109/ICCV.2015.169

[21] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2015.

[22] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” 2016. DOI: https://doi.org/10.1109/CVPR.2017.690

[23] Q. Chen, Y.Wang, T. Yang, X. Zhang, J. Cheng, and J. Sun, “You only look one-level feature,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13034–13043, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.01284

[24] X. Lei, H. Pan, and X. Huang, “A dilated cnn model for image classification,” IEEE Access, vol. 7, pp. 124087–124095, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2927169

[25] Y. Li, H. Mao, R. Girshick, and K. He, “Exploring plain vision transformer backbones for object detection,” arXiv preprint arXiv:2203.16527, 2022. DOI: https://doi.org/10.1007/978-3-031-20077-9_17

[26] Z. Zong, G. Song, and Y. Liu, “Detrs with collaborative hybrid assignments training,” 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6725–6735, 2022. DOI: https://doi.org/10.1109/ICCV51070.2023.00621

[27] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00986

[28] X. Han, L. Zhao, Y. Ning, and J. Hu, “Shipyolo: An enhanced model for ship detection,” Journal of Advanced Transportation, vol. 2021, pp. 1–11, 06 2021. DOI: https://doi.org/10.1155/2021/1060182

[29] Z. Ge, S. Liu, F.Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” 2021.

[30] Z. Zhang, L. Zhang, Y. Wang, P. Feng, and R. He, “Shiprsimagenet: A large-scale fine-grained dataset for ship detection in high- resolution optical remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 8458–8472, 2021. DOI: https://doi.org/10.1109/JSTARS.2021.3104230

[31] H.W. Kuhn, “The Hungarian Method for the Assignment Problem,” Naval Research Logistics Quarterly, vol. 2, pp. 83–97, March 1955. DOI: https://doi.org/10.1002/nav.3800020109

[32] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (Long Beach, CA, USA), pp. 658–666, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00075

[33] P. Hinz, “The layer-wise l1 loss landscape of neural nets is more complex around local minima,” 2021.

[34] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “MMDetection: Open mmlab detection toolbox and benchmark,” arXiv preprint arXiv:1906.07155, 2019.