FrequencyFormer: Oriented Object Detection with Frequency Transformer
DOI:
https://doi.org/10.4108/airo.10701Keywords:
Transformer, Oriented object, FrequencyAbstract
Detecting objects with oriented bounding boxes have shown impressive generalizations in the challenging scenes with densely packed objects with arbitrary rotations. Existing oriented object detectors rely on customized operations like anchor pre-definition and NMS post-processing for accuracy improvement. However, those components usually bring extensive computational costs and complicate the pipeline, and thus limit the scalability of existing methods. In this paper, we propose a new paradigm, FrequencyFormer, for end-to-end oriented object detection. Upon the Transformer based encoder-decoder framework, two key ingredients are proposed to adapt it to detect oriented objects robustly. First, a frequency boosted query updatestrategy is designed to enhance the shape encoding of object queries by incorporating the frequency vectors of oriented objects. Second, a dynamic matching strategy is introduced to facilitate the training process, in which the matching weights are adjusted adaptively as the training progress. Experimental results on DOTA and HRSC2016 datasets demonstrate that our FrequencyFormer achieves competitive performance with stateof-the-art methods.
Downloads
References
[1] Pu, X. and Xu, F. (2024) Low-rank adaption on transformer-based oriented object detector for satellite onboard processing of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing .
[2] Su, Y., Chen, Z., Du, Y., Ji, Z., Hu, K., Bai, J. and Gao, X. (2024) Explicit relational reasoning network for scene text detection : 7069–7077.
[3] Ren, S., He, K., Girshick, R. and Sun, J. (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28.
[4] He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision: 2961–2969.
[5] Yang, X., Hou, L., Zhou, Y., Wang, W. and Yan, J. (2021) Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 15819–15829.
[6] Ding, J., Xue, N., Long, Y., Xia, G.S. and Lu, Q. (2019) Learning roi transformer for oriented object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 2849–2858.
[7] Liu, S., Zhang, L., Lu, H. and He, Y. (2021) Centerboundary dual attention for oriented object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 60: 1–14.
[8] Chen, Z., Chen, K., Lin, W., See, J., Yu, H., Ke, Y. and Yang, C. (2020) Piou loss: Towards accurate oriented object detection in complex environments. In European Conference on Computer Vision (Springer): 195–211.
[9] Liu, S., Zhang, L., Hao, S., Lu, H. and He, Y. (2021) Polar ray: A single-stage angle-free detector for oriented object detection in aerial images. In Proceedings of the 29th ACM International Conference on Multimedia: 3124–3132.
[10] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-end object detection with transformers. In European conference on computer vision (Springer): 213–229.
[11] Han, K., Xiao, A., Wu, E., Guo, J., Xu, C. and Wang, Y. (2021) Transformer in transformer. Advances in Neural Information Processing Systems 34.
[12] Zhu, X., Su,W., Lu, L., Li, B.,Wang, X. and Dai, J. (2020) Deformable detr: Deformable transformers for end-toend object detection. arXiv preprint arXiv:2010.04159 .
[13] Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X. and Lu, H. (2021) Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 8126–8135.
[14] Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M. et al. (2018) Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition: 3974–3983.
[15] Liu, Z., Yuan, L., Weng, L. and Yang, Y. (2017) A high resolution optical satellite image dataset for ship recognition and some new baselines. In International conference on pattern recognition applications and methods (SciTePress), 2: 324–331.
[16] Sheng, T., Chen, J. and Lian, Z. (2021) Centripetaltext: An efficient text instance representation for scene text detection. Advances in Neural Information Processing Systems 34.
[17] Strudel, R., Garcia, R., Laptev, I. and Schmid, C. (2021) Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 7262–7272.
[18] Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X. and Lu, H. (2021) Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 8126–8135.
[19] Zhu, X., Su, W., Lu, L., Li, B., Wang, X. and Dai, J. (2020) Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations.
[20] Wang, Y., Zhang, X., Yang, T. and Sun, J. (2021) Anchor detr: Query design for transformer-based detector. arXiv preprint arXiv:2109.07107 .
[21] Ngo, D.D., Vo, V.L., Le, M.H., Hoc-Phan and Nguyen, M.H. (2025) Transformer based ship detector: An improvement on feature map and tiny training set. EAI Endorsed Transactions on Industrial Networks Intelligent Systems 12(1).
[22] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J. et al. (2021) Dab-detr: Dynamic anchor boxes are better queries for detr. In International Conference on Learning Representations.
[23] Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M. and Zhang, L. (2022) Dn-detr: Accelerate detr training by introducing query denoising. arXiv preprint arXiv:2203.01305 .
[24] Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M. et al. (2022) Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 .
[25] Ehrlich, M. and Davis, L.S. (2019) Deep residual learning in the jpeg transform domain. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 3484–3493.
[26] Xu, K., Qin, M., Sun, F.,Wang, Y., Chen, Y.K. and Ren, F. (2020) Learning in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 1740–1749.
[27] Shen, X., Yang, J., Wei, C., Deng, B., Huang, J., Hua, X.S., Cheng, X. et al. (2021) Dct-mask: Discrete cosine transform mask representation for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 8720–8729.
[28] Xue, Z., Wang, B., Xie, Y., Li, Z., Fan, X., Lin, C., Wei, P. et al. (2025) Fdd-yolo: A lightweight multiscale prohibited items detection model. EAI Endorsed Transactions on AI and Robotics 4(1).
[29] Davari, M., Harooni, A., Nasr, A. and Savoji, K. (2024) Improving recognition accuracy for facial expressions using scattering wavelet. EAI Endorsed Transactions on AI and Robotics 3(1).
[30] Wang, J. (2023) Breast cancer detection via wavelet energy and feed-forward neural network trained by genetic algorithm. EAI Endorsed Transactions on AI and Robotics 2(1).
[31] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition: 770–778.
[32] Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L. et al. (2021) Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 3651-3660.
[33] Zhou, D., Fang, J., Song, X., Guan, C., Yin, J., Dai, Y. and Yang, R. (2019) Iou loss for 2d/3d object detection. In 2019 International Conference on 3D Vision (3DV) (IEEE): 85–94.
[34] Zhang, G., Lu, S. and Zhang, W. (2019) Cad-net: A context-aware detection network for objects in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 57(12): 10015–10024.
[35] Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X. et al. (2019) Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 8232–8241.
[36] Yang, X., Liu, Q., Yan, J., Li, A., Zhang, Z. and Yu, G. (2019) R3det: Refined single-stage detector with feature refinement for rotating object. arXiv preprint arXiv:1908.05612 2(4): 2.
[37] Xu, Y., Fu, M., Wang, Q., Wang, Y., Chen, K., Xia, G.S. and Bai, X. (2020) Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE transactions on pattern analysis and machine intelligence 43(4): 1452–1459.
[38] Zhu, Y., Du, J. and Wu, X. (2020) Adaptive period embedding for representing oriented objects in aerial images. IEEE Transactions on Geoscience and Remote Sensing 58(10): 7247–7257.
[39] Yang, X. and Yan, J. (2020) Arbitrary-oriented object detection with circular smooth label. In European Conference on Computer Vision (Springer): 677–694.
[40] Han, J., Ding, J., Xue, N. and Xia, G.S. (2021) Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 2786-2795.
[41] Xie, X., Cheng, G., Wang, J., Yao, X. and Han, J. (2021) Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 3520–3529.
[42] Xiao, Z., Qian, L., Shao,W., Tan, X. andWang, K. (2020) Axis learning for orientated objects detection in aerial images. Remote Sensing 12(6): 908.
[43] Wei, H., Zhang, Y., Chang, Z., Li, H.,Wang, H. and Sun, X. (2020) Oriented objects as pairs of middle lines. ISPRS Journal of Photogrammetry and Remote Sensing 169: 268–279.
[44] Pan, X., Ren, Y., Sheng, K., Dong, W., Yuan, H., Guo, X., Ma, C. et al. (2020) Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 11207–11216.
[45] Li, W. and Zhu, J. (2021) Oriented reppoints for aerial object detection. arXiv preprint arXiv:2105.11111 .
[46] Ma, T., Mao, M., Zheng, H., Gao, P., Wang, X., Han, S., Ding, E. et al. (2021) Oriented object detection with transformer. arXiv preprint arXiv:2106.03146 .
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Shuai Liu, Haiming Wang, Zhibin Li, Peiyang Wei

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.