SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection

Dimitrios Kollias; Anastasios Arsenos; James Wingate; Stefanos Kollias

doi:10.4108/eetpht.11.9010

Authors

Dimitrios Kollias Queen Mary University of London
Anastasios Arsenos National Technical University of Athens
James Wingate University of Lincoln
Stefanos Kollias National Technical University of Athens

DOI:

https://doi.org/10.4108/eetpht.11.9010

Keywords:

RACNet, SAM, CLIP, segmentation, classification, Covid-19 detection, COV-19 CT-DB

Abstract

This paper presents a new approach for effective segmentation of images that can be integrated into any model and methodology; the paradigm that we choose is classification of medical images (3-D chest CT scans) for Covid-19 detection. Our approach includes a combination of vision-language models that segment the CT scans, which are then fed to a deep neural architecture, named RACNet, for Covid-19 detection. In particular, a novel framework, named SAM2CLIP2SAM, is introduced for segmentation that leverages the strengths of both Segment Anything Model (SAM) and Contrastive Language-Image Pre-Training (CLIP) to accurately segment the right and left lungs in CT scans, subsequently feeding these segmented outputs into RACNet for classification of COVID-19 and non-COVID-19 cases. At first, SAM produces multiple part-based segmentation masks for each slice in the CT scan; then CLIP selects only the masks that are associated with the regions of interest (ROIs), i.e., the right and left lungs; finally SAM is given these ROIs as prompts and generates the final segmentation mask for the lungs.
Experiments are presented across two Covid-19 annotated databases which illustrate the improved performance obtained when our method has been used for segmentation of the CT scans.

Downloads

Download data is not yet available.

References

[1] Tagaris, A., Kollias, D. and Stafylopatis, A. (2017) Assessment of parkinson’s disease based on deep neural networks. In Engineering Applications of Neural Networks: 18th International Conference, EANN 2017, Athens, Greece, August 25–27, 2017, Proceedings (Springer): 391–403.

[2] Tagaris, A., Kollias, D., Stafylopatis, A., Tagaris, G. and Kollias, S. (2018) Machine learning for neurodegenerative disorder diagnosis—survey of practices and launch of benchmark dataset. International Journal on Artificial Intelligence Tools 27(03): 1850011.

[3] Kollias, D., Vendal, K., Gadhavi, P. and Russom, S. (2023) Btdnet: A multi-modal approach for brain tumor radiogenomic classification. Applied Sciences 13(21): 11984.

[4] Chowdhury, D., Das, A., Dey, A., Banerjee, S., Golec, M., Kollias, D., Kumar, M. et al. (2023) Covidetector: A transfer learning-based semi supervised approach to detect covid-19 using cxr images. BenchCouncil Transactions on Benchmarks, Standards and Evaluations 3(2): 100119.

[5] Kollias, D., Tagaris, A., Stafylopatis, A., Kollias, S. and Tagaris, G. (2018) Deep neural architectures for prediction in healthcare. Complex & Intelligent Systems 4(2): 119–131.

[6] Kollias, D., Vlaxos, Y., Seferis, M., Kollia, I., Sukissian, L., Wingate, J. and Kollias, S.D. (2020) Transparent adaptation in deep medical image diagnosis. In TAILOR: 251–267.

[7] Kollias, D., Bouas, N., Vlaxos, Y., Brillakis, V., Seferis, M., Kollia, I., Sukissian, L. et al. (2020) Deep transparent prediction through latent representation analysis. arXiv preprint arXiv:2009.07044 .

[8] Salpea, N., Tzouveli, P. and Kollias, D. (2022) Medical image segmentation: A review of modern architectures. In European Conference on Computer Vision (Springer): 691–708.

[9] Arsenos, A., Kollias, D. and Kollias, S. (2022) A large imaging database and novel deep neural architecture for covid-19 diagnosis. In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP) (IEEE): 1–5.

[10] Kollias, D., Arsenos, A. and Kollias, S. (2024) Domain adaptation, explainability & fairness in ai for medical image analysis: Diagnosis of covid-19 based on 3-d chest ct-scans. arXiv preprint arXiv:2403.02192 .

[11] Kollias, D., Arsenos, A. and Kollias, S. (2023) Aienabled analysis of 3-d ct scans for diagnosis of covid- 19 & its severity. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (IEEE): 1–5.

[12] Arsenos, A., Davidhi, A., Kollias, D., Prassopoulos, P. and Kollias, S. (2023) Data-driven covid-19 detection through medical imaging. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (IEEE): 1–5.

[13] Kollias, D., Arsenos, A. and Kollias, S. (2023) A deep neural architecture for harmonizing 3-d input data analysis and decision making in medical imaging. Neurocomputing 542: 126244.

[14] Kollias, D., Arsenos, A. and Kollias, S. (2023) Ai-mia: Covid-19 detection and severity analysis through medical imaging. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII (Springer): 677–690.

[15] Gerogiannis, D., Arsenos, A., Kollias, D., Nikitopoulos, D. and Kollias, S. (2024) Covid-19 computeraided diagnosis through ai-assisted ct imaging analysis: Deploying a medical ai system. arXiv preprint arXiv:2403.06242 .

[16] Kollias, D., Arsenos, A., Soukissian, L. and Kollias, S. (2021) Mia-cov19d: Covid-19 detection through 3-d chest ct image analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 537–544.

[17] Azad, R., Asadi-Aghbolaghi, M., Fathy, M. and Escalera, S. (2019) Bi-directional convlstm u-net with densley connected convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.

[18] Morani, K., Ayana, E.K., Kollias, D. and Unay, D. (2024) Covid-19 detection from computed tomography images using slice processing techniques and a modified xception classifier. International Journal of Biomedical Imaging 2024(1): 9962839.

[19] Morani, K., Ayana, E.K., Kollias, D. and Unay, D. (2024) Detecting covid-19 in computed tomography images: A novel approach utilizing segmentation with unet architecture, lung extraction, and cnn classifier. In Science and Information Conference (Springer): 450–465.

[20] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T. et al. (2023) Segment anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) : 3992–4003URL https://api.semanticscholar.org/CorpusID:257952310.

[21] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G. et al. (2021) Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T.[eds.] Proceedings of the 38th International Conference on Machine Learning (PMLR), Proceedings of Machine Learning Research 139: 8748–8763. URL https://proceedings.mlr.press/v139/radford21a.html.

[22] Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N. and Zhang, Y. (2023) Segment anything model for medical image analysis: an experimental study. Medical image analysis 89: 102918. URL https://api.semanticscholar.org/CorpusID:258236547.

[23] Maniparambil, M., Vorster, C.,Molloy, D.,Murphy, N., McGuinness, K. and O’Connor, N.E. (2023), Enhancing clip with gpt-4: Harnessing visual descriptions as prompts. URL https://arxiv.org/abs/2307.11661.2307.11661.

[24] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M. et al. (2021), An image is worth 16x16 words: Transformers for image recognition at scale. URL https://arxiv.org/abs/2010.11929. 2010.11929.

[25] Jaiswal, A., Gianchandani, N., Singh, D., Kumar, V. and Kaur, M. (2020) Classification of the covid-19 infected patients using densenet201 based deep transfer learning. Journal of Biomolecular Structure and Dynamics : 1–8.

[26] Khadidos, A., Khadidos, A.O., Kannan, S., Natarajan, Y.,Mohanty, S.N. and Tsaramirsis, G. (2020) Analysis of covid-19 infections on a ct image using deepsense model. Frontiers in Public Health 8.

[27] Amyar, A., Modzelewski, R., Li, H. and Ruan, S. (2020) Multi-task deep learning based ct imaging analysis for covid-19 pneumonia: Classification and segmentation. Computers in Biology and Medicine 126: 104037.

[28] Wang, X., Deng, X., Fu, Q., Zhou, Q., Feng, J., Ma, H., Liu, W. et al. (2020) A weakly-supervised framework for covid-19 classification and lesion localization from chest ct. IEEE transactions on medical imaging 39(8): 2615–2625.

[29] He, X.,Wang, S., Chu, X., Shi, S., Tang, J., Liu, X., Yan, C. et al. (2021) Automated model design and benchmarking of deep learning models for covid-19 detection with chest ct scans. In Proceedings of the AAAI Conference on Artificial Intelligence, 35: 4821–4829.

[30] Aleem, S., Wang, F., Maniparambil, M., Arazo, E., Dietlmeier, J., Curran, K., Connor, N.E. et al. (2024) Test-time adaptation with salip: A cascade of sam and clip for zero-shot medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 5184–5193.

[31] Kollias, D., Arsenos, A. and Kollias, S. (2023) A deep neural architecture for harmonizing 3-d input data analysis and decision making in medical imaging. Neurocomputing 542: 126244.

[32] Arsenos, A., Davidhi, A., Kollias, D., Prassopoulos, P. and Kollias, S. (2023) Data-driven covid- 19 detection through medical imaging. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW): 1–5. doi:10.1109/ICASSPW59220.2023.10193437.

[33] Kollias, D., Arsenos, A., Soukissian, L. and Kollias, S. (2021) Mia-cov19d: Covid-19 detection through 3-d chest ct image analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision: 537–544.

[34] Kollias, D., Arsenos, A. and Kollias, S. (2023) Ai-enabled analysis of 3-d ct scans for diagnosis of covid-19 & its severity. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW): 1–5. doi:10.1109/ICASSPW59220.2023.10193422.

[35] Kollias, D., Yu, M., Tagaris, A., Leontidis, G., Stafylopatis, A. and Kollias, S. Adaptation and contextualization of deep neural network models. In 2017 IEEE symposium series on computational intelligence (SSCI) (IEEE): 1–8.

[36] Wang, J., Lan, C., Liu, C., Ouyang, Y., Zeng,W. and Qin, T. (2021) Generalizing to unseen domains: A survey on domain generalization. arXiv preprint arXiv:2103.03097.

[37] Kollias, D., Bouas, N., Vlaxos, Y., Brillakis, V., Seferis, M., Kollia, I., Sukissian, L. et al. (2020) Deep transparent prediction through latent representation analysis. arXiv preprint arXiv:2009.07044 .

[38] Kollias, D., Vlaxos, Y., Seferis, M., Kollia, I., Sukissian, L., Wingate, J. and Kollias, S.D. (2020) Transparent adaptation in deep medical image diagnosis. In TAILOR: 251–267.

[39] Morozov, S.P., Andreychenko, A.E., Blokhin, I.A., Gelezhe, P.B., Gonchar, A.P., Nikolaev, A.E., Pavlov, N.A. et al. (2020) Mosmeddata: data set of 1110 chest ct scans performed during the covid-19 epidemic. Digital Diagnostics 1(1): 49–59.

[40] Turnbull, R. (2022) Cov3d: Detection of the presence and severity of COVID-19 from CT scans using 3D ResNets [Preliminary Preprint] URL https://doi.org/10.48550/arXiv.2207.12218.

[41] Hou, J., Xu, J., Feng, R. and Zhang, Y. (2022), Fdvts’s solution for 2nd cov19d competition on covid-19 detection and severity analysis. doi:10.48550/ARXIV.2207.01758, URL https://arxiv.org/abs/2207.01758.

[42] Hsu, C.C., Tsai, C.H., Chen, G.L., Ma, S.D. and Tai, S.C. (2022), Spatiotemporal feature learning based on two-step lstm and transformer for ct scans. doi:10.48550/ARXIV.2207.01579, URL https://arxiv.org/abs/2207.01579.

[43] Salpea, N., Tzouveli, P. and Kollias, D. (2023) Medical image segmentation: A review of modern architectures. In Computer Vision–ECCV 2022Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII (Springer): 691–708.

[44] Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y. and Paluri, M. (2017), A closer look at spatiotemporal convolutions for action recognition. doi:10.48550/ARXIV.1711.11248, URL https://arxiv.org/abs/1711.11248.

[45] Diba, A., Fayyaz, M., Sharma, V., Karami, A.H., Arzani, M.M., Yousefzadeh, R. and Van Gool, L. (2017), Temporal 3d convnets: New architecture and transfer learning for video classification. doi:10.48550/ARXIV.1711.08200, URL https://arxiv.org/abs/1711.08200.

[46] He, X., Ying, G., Zhang, J. and Chu, X. (2022) Evolutionary multi-objective architecture search framework: Application to covid-19 3d ct classification. InWang, L., Dou, Q., Fletcher, P.T., Speidel, S. and Li, S. [eds.] Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (Cham: Springer Nature Switzerland): 560–570.

[47] Arsenos, A., Kollias, D., Petrongonas, E., Skliros, C. and Kollias, S. (2024) Uncertainty-guided contrastive learning for single source domain generalisation. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE): 6935–6939.

SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Scopus_CiteScore

Latest publications