SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection
DOI:
https://doi.org/10.4108/eetpht.11.9010Keywords:
RACNet, SAM, CLIP, segmentation, classification, Covid-19 detection, COV-19 CT-DBAbstract
This paper presents a new approach for effective segmentation of images that can be integrated into any model and methodology; the paradigm that we choose is classification of medical images (3-D chest CT scans) for Covid-19 detection. Our approach includes a combination of vision-language models that segment the CT scans, which are then fed to a deep neural architecture, named RACNet, for Covid-19 detection. In particular, a novel framework, named SAM2CLIP2SAM, is introduced for segmentation that leverages the strengths of both Segment Anything Model (SAM) and Contrastive Language-Image Pre-Training (CLIP) to accurately segment the right and left lungs in CT scans, subsequently feeding these segmented outputs into RACNet for classification of COVID-19 and non-COVID-19 cases. At first, SAM produces multiple part-based segmentation masks for each slice in the CT scan; then CLIP selects only the masks that are associated with the regions of interest (ROIs), i.e., the right and left lungs; finally SAM is given these ROIs as prompts and generates the final segmentation mask for the lungs.
Experiments are presented across two Covid-19 annotated databases which illustrate the improved performance obtained when our method has been used for segmentation of the CT scans.
Downloads
References
[1] Tagaris, A., Kollias, D. and Stafylopatis, A. (2017) Assessment of parkinson’s disease based on deep neural networks. In Engineering Applications of Neural Networks: 18th International Conference, EANN 2017, Athens, Greece, August 25–27, 2017, Proceedings (Springer): 391–403. DOI: https://doi.org/10.1007/978-3-319-65172-9_33
[2] Tagaris, A., Kollias, D., Stafylopatis, A., Tagaris, G. and Kollias, S. (2018) Machine learning for neurodegenerative disorder diagnosis—survey of practices and launch of benchmark dataset. International Journal on Artificial Intelligence Tools 27(03): 1850011. DOI: https://doi.org/10.1142/S0218213018500112
[3] Kollias, D., Vendal, K., Gadhavi, P. and Russom, S. (2023) Btdnet: A multi-modal approach for brain tumor radiogenomic classification. Applied Sciences 13(21): 11984. DOI: https://doi.org/10.3390/app132111984
[4] Chowdhury, D., Das, A., Dey, A., Banerjee, S., Golec, M., Kollias, D., Kumar, M. et al. (2023) Covidetector: A transfer learning-based semi supervised approach to detect covid-19 using cxr images. BenchCouncil Transactions on Benchmarks, Standards and Evaluations 3(2): 100119. DOI: https://doi.org/10.1016/j.tbench.2023.100119
[5] Kollias, D., Tagaris, A., Stafylopatis, A., Kollias, S. and Tagaris, G. (2018) Deep neural architectures for prediction in healthcare. Complex & Intelligent Systems 4(2): 119–131. DOI: https://doi.org/10.1007/s40747-017-0064-6
[6] Kollias, D., Vlaxos, Y., Seferis, M., Kollia, I., Sukissian, L., Wingate, J. and Kollias, S.D. (2020) Transparent adaptation in deep medical image diagnosis. In TAILOR: 251–267.
[7] Kollias, D., Bouas, N., Vlaxos, Y., Brillakis, V., Seferis, M., Kollia, I., Sukissian, L. et al. (2020) Deep transparent prediction through latent representation analysis. arXiv preprint arXiv:2009.07044 .
[8] Salpea, N., Tzouveli, P. and Kollias, D. (2022) Medical image segmentation: A review of modern architectures. In European Conference on Computer Vision (Springer): 691–708.
[9] Arsenos, A., Kollias, D. and Kollias, S. (2022) A large imaging database and novel deep neural architecture for covid-19 diagnosis. In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP) (IEEE): 1–5. DOI: https://doi.org/10.1109/IVMSP54334.2022.9816321
[10] Kollias, D., Arsenos, A. and Kollias, S. (2024) Domain adaptation, explainability & fairness in ai for medical image analysis: Diagnosis of covid-19 based on 3-d chest ct-scans. arXiv preprint arXiv:2403.02192 . DOI: https://doi.org/10.1109/CVPRW63382.2024.00495
[11] Kollias, D., Arsenos, A. and Kollias, S. (2023) Aienabled analysis of 3-d ct scans for diagnosis of covid- 19 & its severity. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (IEEE): 1–5.
[12] Arsenos, A., Davidhi, A., Kollias, D., Prassopoulos, P. and Kollias, S. (2023) Data-driven covid-19 detection through medical imaging. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (IEEE): 1–5.
[13] Kollias, D., Arsenos, A. and Kollias, S. (2023) A deep neural architecture for harmonizing 3-d input data analysis and decision making in medical imaging. Neurocomputing 542: 126244.
[14] Kollias, D., Arsenos, A. and Kollias, S. (2023) Ai-mia: Covid-19 detection and severity analysis through medical imaging. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII (Springer): 677–690. DOI: https://doi.org/10.1007/978-3-031-25082-8_46
[15] Gerogiannis, D., Arsenos, A., Kollias, D., Nikitopoulos, D. and Kollias, S. (2024) Covid-19 computeraided diagnosis through ai-assisted ct imaging analysis: Deploying a medical ai system. arXiv preprint arXiv:2403.06242 . DOI: https://doi.org/10.1109/ISBI56570.2024.10635484
[16] Kollias, D., Arsenos, A., Soukissian, L. and Kollias, S. (2021) Mia-cov19d: Covid-19 detection through 3-d chest ct image analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 537–544.
[17] Azad, R., Asadi-Aghbolaghi, M., Fathy, M. and Escalera, S. (2019) Bi-directional convlstm u-net with densley connected convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. DOI: https://doi.org/10.1109/ICCVW.2019.00052
[18] Morani, K., Ayana, E.K., Kollias, D. and Unay, D. (2024) Covid-19 detection from computed tomography images using slice processing techniques and a modified xception classifier. International Journal of Biomedical Imaging 2024(1): 9962839. DOI: https://doi.org/10.1155/2024/9962839
[19] Morani, K., Ayana, E.K., Kollias, D. and Unay, D. (2024) Detecting covid-19 in computed tomography images: A novel approach utilizing segmentation with unet architecture, lung extraction, and cnn classifier. In Science and Information Conference (Springer): 450–465. DOI: https://doi.org/10.1007/978-3-031-62269-4_31
[20] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T. et al. (2023) Segment anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) : 3992–4003URL https://api.semanticscholar.org/CorpusID:257952310. DOI: https://doi.org/10.1109/ICCV51070.2023.00371
[21] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G. et al. (2021) Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T.[eds.] Proceedings of the 38th International Conference on Machine Learning (PMLR), Proceedings of Machine Learning Research 139: 8748–8763. URL https://proceedings.mlr.press/v139/radford21a.html.
[22] Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N. and Zhang, Y. (2023) Segment anything model for medical image analysis: an experimental study. Medical image analysis 89: 102918. URL https://api.semanticscholar.org/CorpusID:258236547. DOI: https://doi.org/10.1016/j.media.2023.102918
[23] Maniparambil, M., Vorster, C.,Molloy, D.,Murphy, N., McGuinness, K. and O’Connor, N.E. (2023), Enhancing clip with gpt-4: Harnessing visual descriptions as prompts. URL https://arxiv.org/abs/2307.11661.2307.11661. DOI: https://doi.org/10.1109/ICCVW60793.2023.00034
[24] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M. et al. (2021), An image is worth 16x16 words: Transformers for image recognition at scale. URL https://arxiv.org/abs/2010.11929. 2010.11929.
[25] Jaiswal, A., Gianchandani, N., Singh, D., Kumar, V. and Kaur, M. (2020) Classification of the covid-19 infected patients using densenet201 based deep transfer learning. Journal of Biomolecular Structure and Dynamics : 1–8. DOI: https://doi.org/10.1080/07391102.2020.1788642
[26] Khadidos, A., Khadidos, A.O., Kannan, S., Natarajan, Y.,Mohanty, S.N. and Tsaramirsis, G. (2020) Analysis of covid-19 infections on a ct image using deepsense model. Frontiers in Public Health 8. DOI: https://doi.org/10.3389/fpubh.2020.599550
[27] Amyar, A., Modzelewski, R., Li, H. and Ruan, S. (2020) Multi-task deep learning based ct imaging analysis for covid-19 pneumonia: Classification and segmentation. Computers in Biology and Medicine 126: 104037. DOI: https://doi.org/10.1016/j.compbiomed.2020.104037
[28] Wang, X., Deng, X., Fu, Q., Zhou, Q., Feng, J., Ma, H., Liu, W. et al. (2020) A weakly-supervised framework for covid-19 classification and lesion localization from chest ct. IEEE transactions on medical imaging 39(8): 2615–2625. DOI: https://doi.org/10.1109/TMI.2020.2995965
[29] He, X.,Wang, S., Chu, X., Shi, S., Tang, J., Liu, X., Yan, C. et al. (2021) Automated model design and benchmarking of deep learning models for covid-19 detection with chest ct scans. In Proceedings of the AAAI Conference on Artificial Intelligence, 35: 4821–4829. DOI: https://doi.org/10.1609/aaai.v35i6.16614
[30] Aleem, S., Wang, F., Maniparambil, M., Arazo, E., Dietlmeier, J., Curran, K., Connor, N.E. et al. (2024) Test-time adaptation with salip: A cascade of sam and clip for zero-shot medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 5184–5193. DOI: https://doi.org/10.1109/CVPRW63382.2024.00526
[31] Kollias, D., Arsenos, A. and Kollias, S. (2023) A deep neural architecture for harmonizing 3-d input data analysis and decision making in medical imaging. Neurocomputing 542: 126244. DOI: https://doi.org/10.1016/j.neucom.2023.126244
[32] Arsenos, A., Davidhi, A., Kollias, D., Prassopoulos, P. and Kollias, S. (2023) Data-driven covid- 19 detection through medical imaging. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW): 1–5. doi:10.1109/ICASSPW59220.2023.10193437. DOI: https://doi.org/10.1109/ICASSPW59220.2023.10193437
[33] Kollias, D., Arsenos, A., Soukissian, L. and Kollias, S. (2021) Mia-cov19d: Covid-19 detection through 3-d chest ct image analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 537–544. DOI: https://doi.org/10.1109/ICCVW54120.2021.00066
[34] Kollias, D., Arsenos, A. and Kollias, S. (2023) Ai-enabled analysis of 3-d ct scans for diagnosis of covid-19 & its severity. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW): 1–5. doi:10.1109/ICASSPW59220.2023.10193422. DOI: https://doi.org/10.1109/ICASSPW59220.2023.10193422
[35] Kollias, D., Yu, M., Tagaris, A., Leontidis, G., Stafylopatis, A. and Kollias, S. Adaptation and contextualization of deep neural network models. In 2017 IEEE symposium series on computational intelligence (SSCI) (IEEE): 1–8. DOI: https://doi.org/10.1109/SSCI.2017.8280975
[36] Wang, J., Lan, C., Liu, C., Ouyang, Y., Zeng,W. and Qin, T. (2021) Generalizing to unseen domains: A survey on domain generalization. arXiv preprint arXiv:2103.03097. DOI: https://doi.org/10.24963/ijcai.2021/628
[37] Kollias, D., Bouas, N., Vlaxos, Y., Brillakis, V., Seferis, M., Kollia, I., Sukissian, L. et al. (2020) Deep transparent prediction through latent representation analysis. arXiv preprint arXiv:2009.07044 .
[38] Kollias, D., Vlaxos, Y., Seferis, M., Kollia, I., Sukissian, L., Wingate, J. and Kollias, S.D. (2020) Transparent adaptation in deep medical image diagnosis. In TAILOR: 251–267. DOI: https://doi.org/10.1007/978-3-030-73959-1_22
[39] Morozov, S.P., Andreychenko, A.E., Blokhin, I.A., Gelezhe, P.B., Gonchar, A.P., Nikolaev, A.E., Pavlov, N.A. et al. (2020) Mosmeddata: data set of 1110 chest ct scans performed during the covid-19 epidemic. Digital Diagnostics 1(1): 49–59. DOI: https://doi.org/10.17816/DD46826
[40] Turnbull, R. (2022) Cov3d: Detection of the presence and severity of COVID-19 from CT scans using 3D ResNets [Preliminary Preprint] URL https://doi.org/10.48550/arXiv.2207.12218. DOI: https://doi.org/10.1007/978-3-031-25082-8_45
[41] Hou, J., Xu, J., Feng, R. and Zhang, Y. (2022), Fdvts’s solution for 2nd cov19d competition on covid-19 detection and severity analysis. doi:10.48550/ARXIV.2207.01758, URL https://arxiv.org/abs/2207.01758.
[42] Hsu, C.C., Tsai, C.H., Chen, G.L., Ma, S.D. and Tai, S.C. (2022), Spatiotemporal feature learning based on two-step lstm and transformer for ct scans. doi:10.48550/ARXIV.2207.01579, URL https://arxiv.org/abs/2207.01579.
[43] Salpea, N., Tzouveli, P. and Kollias, D. (2023) Medical image segmentation: A review of modern architectures. In Computer Vision–ECCV 2022Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII (Springer): 691–708. DOI: https://doi.org/10.1007/978-3-031-25082-8_47
[44] Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y. and Paluri, M. (2017), A closer look at spatiotemporal convolutions for action recognition. doi:10.48550/ARXIV.1711.11248, URL https://arxiv.org/abs/1711.11248. DOI: https://doi.org/10.1109/CVPR.2018.00675
[45] Diba, A., Fayyaz, M., Sharma, V., Karami, A.H., Arzani, M.M., Yousefzadeh, R. and Van Gool, L. (2017), Temporal 3d convnets: New architecture and transfer learning for video classification. doi:10.48550/ARXIV.1711.08200, URL https://arxiv.org/abs/1711.08200.
[46] He, X., Ying, G., Zhang, J. and Chu, X. (2022) Evolutionary multi-objective architecture search framework: Application to covid-19 3d ct classification. InWang, L., Dou, Q., Fletcher, P.T., Speidel, S. and Li, S. [eds.] Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (Cham: Springer Nature Switzerland): 560–570. DOI: https://doi.org/10.1007/978-3-031-16431-6_53
[47] Arsenos, A., Kollias, D., Petrongonas, E., Skliros, C. and Kollias, S. (2024) Uncertainty-guided contrastive learning for single source domain generalisation. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE): 6935–6939. DOI: https://doi.org/10.1109/ICASSP48485.2024.10448096
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Dimitrios Kollias, Anastasios Arsenos, James Wingate, Stefanos Kollias

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.