A Point Cloud Instance Segmentation Framework with Attention Mechanisms and Semantic Refinement
DOI:
https://doi.org/10.4108/eetsis.11524Keywords:
point cloud instance segmentation, reverse attention mechanism, self-attention mechanism, feature interleavingAbstract
INTRODUCTION: Point cloud instance segmentation, a critical 3D computer vision task, faces significant challenges in complex indoor environments. Methods often suffer from insufficient feature extraction and a strong coupling between semantic prediction and instance features, where semantic errors cascade and limit overall accuracy.
OBJECTIVES: This paper proposes an innovative approach using attention mechanisms and semantic refinement to address these limitations. The primary goal is to enhance feature representation and alleviate the strong dependency between semantic prediction and instance segmentation.
METHODS: We introduce three key innovations: 1) A reverse attention mechanism to improve multi-level feature fusion; 2) An instance soft clustering strategy incorporating semantic scores to weaken the semantic-instance coupling; and 3) A self-attention-based instance refinement network. Finally, a dual-branch scoring mechanism, combining classification and mask scores, jointly determines confidence levels to further mitigate semantic errors.
RESULTS: Evaluated on the S3DIS dataset, our model achieved 74.1% mean Precision(mPrec) on the Area 5 test. In the more rigorous six-fold cross-validation, it achieved 76.8% mPrec and 72.3% mean recall rate(mRec), outperforming the state-of-the-art (SOTA) model SoftGroup by 1.5% and 2.5%, respectively.
CONCLUSION: The proposed method significantly improves instance segmentation accuracy and demonstrates stronger robustness in complex scenes. It effectively resolves the strong coupling issue, providing a novel technical pathway for point cloud instance segmentation. While primarily optimized for dense indoor scans, adapting this framework for extremely sparse outdoor point clouds remains a compelling direction for future exploration.
References
[1] Guo Y, Wang H, Hu Q, et al. Deep learning for 3d point clouds: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 43(12): 4338-4364.
[2] Rani A, Ortiz-Arroyo D, Durdevic P. Advancements in point cloud-based 3D defect classification and segmentation for industrial systems: A comprehensive survey[J]. Information Fusion, 2024, 112: 102575.
[3] Yasir S M, Sadiq A M, Ahn H. 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data[J]. Computers, Materials and Continua, 2022, 72(3): 5777-5791.
[4] Nunes L, Chen X, Marcuzzi R, et al. Unsupervised class-agnostic instance segmentation of 3d lidar data for autonomous vehicles[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 8713-8720.
[5] Li X, Ding H, Yuan H, et al. Transformer-based visual segmentation: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2024.
[6] Schult J, Engelmann F, Hermans A, et al. Mask3D: Mask Transformer for 3D Semantic Instance Segmentation[C]//2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023: 8216-8223.
[7] Brunklaus M, Kellner M, Reiterer A. Three-Dimensional Instance Segmentation of Rooms in Indoor Building Point Clouds Using Mask3D[J]. Remote Sensing, 2025, 17(7): 1124.
[8] Liu C, Furukawa Y. MASC: Multi-scale affinity with sparse convolution for 3D instance segmentation. arXiv preprint. 2019; arXiv:1902.04478.
[9] Chen S, Fang J, Zhang Q, et al. Hierarchical aggregation for 3d instance segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 15467-15476.
[10] Cong B G, Wang X H, Zhao X, et al. 3D-SDIS: enhanced 3D instance segmentation through frequency fusion and dual-sphere sampling[J]. The Visual Computer, 2025: 1-14.
[11] Chen F, Wu F, Gao G, et al. JSPNet: Learning joint semantic & instance segmentation of point clouds via feature self-similarity and cross-task probability[J]. Pattern Recognition, 2022, 122: 108250.
[12] Vu T, Kim K, Luu T M, et al. Softgroup for 3d instance segmentation on point clouds[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 2708-2717.
[13] Graham B, Engelcke M, Van Der Maaten L. 3d semantic segmentation with submanifold sparse convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 9224-9232.
[14] Deng H, Chen S, Zhu X, et al. EP-Net: Improving Point Cloud Learning Efficiency Through Feature Decoupling[J]. IEEE Transactions on Instrumentation and Measurement, 2024.
[15] Du J, Cai G, Wang Z, et al. MTCloud: Multi-type convolutional linkage network for point cloud instance segmentation[J]. Expert Systems with Applications, 2025, 270: 126432.
[16] Hegde S, Gangisetty S. PIG-Net: Inception based deep learning architecture for 3D point cloud segmentation[J]. Computers & Graphics, 2021, 95: 13-22.
[17] Vanian V, Zamanakos G, Pratikakis I. Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms[J]. Computers & Graphics, 2022, 106: 277-287.
[18] Guo S, Cai J, Hu Y, et al. LCASAFormer: Cross-attention enhanced backbone network for 3D point cloud tasks[J]. Pattern Recognition, 2025, 162: 111361.
[19] Hong F, Kong L, Zhou H, et al. Unified 3d and 4d panoptic segmentation via dynamic shifting networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 3480-3495.
[20] Xv J, Deng F. 3D point cloud instance segmentation considering global shape contour constraints[J]. Remote Sensing, 2023, 15(20): 4939.
[21] Chen S, Fang J, Zhang Q, et al. Hierarchical aggregation for 3d instance segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 15467-15476.
[22] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(4): 640-651.
[23] Ibtehaz N, Rahman M S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation[J]. Neural networks, 2020, 121: 74-87.
[24] Hazer A, Yildirim R. Deep learning based point cloud processing techniques[J]. IEEE Access, 2022, 10: 127237-127283.
[25] Shuai H, Xu X, Liu Q. Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 4973-4984.
[26] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[27] Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10076-10085.
[28] Zhao H, Jiang L, Jia J, et al. Point transformer[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 16259-16268..
[29] Behley J, Garbade M, Milioto A, et al. Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset[J]. The International Journal of Robotics Research, 2021, 40(8-9): 959-967.
[30] Armeni I, Sener O, Zamir A R, et al. 3d semantic parsing of large-scale indoor spaces[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1534-1543.
[31] Wang W, Yu R, Huang Q, et al. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2569-2578.
[32] Wang X, Liu S, Shen X, et al. Associatively segmenting instances and semantics in point clouds[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4096-4105.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Xi Chen, Meiji Chen, Hao Lin

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.