Power grid inspection based on multimodal foundation models

Authors

  • Jingbo Hao Nanchang Institute of Science & Technology, Nanchang 330108, China & Hunan Chaoneng Robot, Changsha 410003, China image/svg+xml
  • Yang Tao Nanchang Vocational University, Nanchang 330007, China

DOI:

https://doi.org/10.4108/ew.9087

Keywords:

power grid inspection, foundation model, large language model, multimodal application

Abstract

INTRODUCTION: With the development of large foundation models, power grid inspection is transmitting from traditional deep learning to multimodal foundation models.

OBJECTIVES: This paper aims to boost the application of multimodal foundation models for power grid inspection.

METHODS: Current research on foundation models and multimodal large language models (LLMs) is introduced respectively. Three application forms of multimodal foundation models in power grid inspection are explored. The reliability of these models is discussed as well.

RESULTS: These techniques can significantly reduce the time and cost of inspection by automating the analysis of large amounts of sensor data. They can also improve the accuracy and reliability of inspection by leveraging the understanding and reasoning abilities of LLMs.

CONCLUSION: These advanced techniques have shown great application potential in power grid inspection. But it is important to note that they should not entirely replace human inspectors who can validate automatic findings and address possible issues not captured by these models alone.

Downloads

References

[1] Wang L. A review of the application of machine vision in power safety monitoring. Zhejiang Electric Power. 2022; 41(10): 16-26.

[2] Yang L, Fan J, Liu Y, et al. A review on state-of-the-art power line inspection techniques. IEEE Transactions on Instrumentation and Measurement. 2020; 69(12): 9350-9365.

[3] Gao P, Rao Z, Gao S, et al. Research on grid inspection technology based on general knowledge enhanced multimodal large language models. Proceedings of the 12th International Symposium on Multispectral Image Processing and Pattern Recognition; 10-12 November 2023; Wuhan. Bellingham: SPIE; 2024. 130860B.

[4] Wang J, Li M, Luo H, et al. Power-LLaVA: large language and vision assistant for power transmission line inspection. arXiv Preprint. 2024; 2407.19178.

[5] Majumder S, Dong L, Doudi F, et al. Exploring the capabilities and limitations of large language models in the electric energy sector. Joule. 2024; 8(6): 1544-1549.

[6] Shen Y, Fu C, Chen P, et al. Aligning and prompting everything all at once for universal visual perception. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 17-21 June 2024; Seattle. Piscataway: IEEE; 2024. p. 13193-13203.

[7] Zhao Z, Feng S, Xi Y, et al. The era of large models: a new starting point for electric power vision technology. High Voltage Engineering. 2024; 50(5): 1813-1825.

[8] Bommasani R, Hudson DA, Adeli E, et al. On the opportunities and risks of foundation models. arXiv Preprint. 2022; 2108.07258.

[9] Minaee S, Mikolov T, Nikzad N, et al. Large language models: a survey. arXiv Preprint. 2024; 2402.06196.

[10] Zhang Y, Huang X, Ma J, et al. Recognize anything: a strong image tagging model. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; 17-21 June 2024; Seattle. Piscataway: IEEE; 2024. p. 1724-1732.

[11] Wu J, Jiang Y, Liu Q, et al. General object foundation model for images and videos at scale. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 17-21 June 2024; Seattle. Piscataway: IEEE; 2024. p. 3783-3795.

[12] Kirillov A, Mintun E, Ravi N, et al. Segment anything. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision; 1-6 October 2023; Paris. Piscataway: IEEE; 2023. p. 3992-4003.

[13] Bordes F, Pang RY, Ajay A, et al. An introduction to vision-language modeling. arXiv Preprint. 2024; 2405.17247.

[14] Yin S, Fu C, Zhao S, et al. A survey on multimodal large language models. arXiv Preprint. 2024; 2306.13549.

[15] Lin Y, Zhang F, Li Z, et al. Large model based interactive segmentation of infrared image for power equipment. Journal of Network New Media. 2024; 13(2): 53-60, 67.

[16] Guo Y, Gong R, Li D, et al. SAMPE: auto-prompting SAM for generalizable power equipment image segmentation. IEEE Access. 2024; 12: 104291-104299.

[17] Li J, Li D, Xiong C, et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. Proceedings of the 39th International Conference on Machine Learning; 17-23 July 2022; Baltimore. Online: PMLR; 2022. p. 12888-12900.

[18] Ye Q, Yang J. “Power GPT” boosts the efficiency of safety hazard alarming by six times. Science and Technology Daily. 18 September 2023.

[19] Ye Q. Power large model: proficient in “chat, query, diagram and write.” Science and Technology Daily. 9 October 2023.

[20] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. Proceedings of the 2nd international conference on learning representations; 14-16 April 2014; Banff. Online: OpenReview; 2014.

[21] Hao J, Tao Y. Adversarial attacks on deep learning models in smart grids. Energy Reports. 2022; 8(S2): 123-129.

[22] Cui X, Aparcedo A, Jang YK, et al. On the robustness of large multimodal models against image adversarial attacks. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 17-21 June 2024; Seattle. Piscataway: IEEE; 2024. p. 24625-24634.

[23] Hao J. Towards reliable medical image analysis based on deep learning with XAI. Proceedings of the 2nd International Conference on Image, Signal Processing and Pattern Recognition; 24-26 February 2023; Changsha. Bellingham: SPIE; 2023. 1270754.

[24] Cambria E, Malandri L, Mercorio F, et al. XAI meets LLMs: a survey of the relation between explainable AI and large language models. arXiv Preprint. 2024; 2407.15248.

[25] Bai Z, Wang P, Xiao T, et al. Hallucination of multimodal large language models: a survey. arXiv Preprint. 2024; 2404.18930.

[26] Zhao T, Zhang L, Ma Y, et al. A survey on safe multi-modal learning systems. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 25-29 August 2024; Barcelona. New York: ACM; 2024. p. 6655-6665.

Downloads

Published

14-04-2025

How to Cite

1.
Hao J, Tao Y. Power grid inspection based on multimodal foundation models. EAI Endorsed Trans Energy Web [Internet]. 2025 Apr. 14 [cited 2025 Jun. 5];12. Available from: https://publications.eai.eu/index.php/ew/article/view/9087

Funding data