Building an Intelligent Home Perception System Based on Multi-Modal Information Interaction

Authors

DOI:

https://doi.org/10.4108/eetsis.10349

Abstract

In response to the problems of single interaction modality and weak perception ability in traditional smart homes, this paper proposes a multi-modal information perception Artificial Intelligence(AI) model invocation framework. It schedules visual, voice, and sensor data through natural language prompts, and combines the zero-shot visual recognition method of the cloud-based visual-language hybrid large model workflow to achieve cross-scene generalization ability without labeled training. This framework can innovatively solve the problems of heterogeneous data fusion and insufficient computing power of edge devices. Experimental results show that the multi-modal smart home perception system designed in this paper achieves an accuracy rate of over 90% in environmental perception and a precision rate as high as 92% in user intention recognition, which can provide new ideas and practical foundations for the multi-modal perception of future smart home technology.

References

[1] D.N. Mekuria, P. Sernani, N. Falcionelli, A.F. Dragoni, Smart home reasoning systems: a systematic literature review, J. Ambient. Intell. Humaniz.Comput.12(4)(2021)4485–4502. http://dx.doi.org/10.1007/s12652-019-01572-z, URL https://link.springer.com/10.1007/s12652-019-01572-z.

[2] A. Mari, A. Mandelli, R. Algesheimer, Empathic voice assistants: Enhancing consumer responses in voice commerce, J. Bus. Res. 175 (2024) 114566, http://dx.doi.org/10.1016/j.jbusres.2024.114566, URL DOI: 10.1016/j.jbusres.2024.114566.

[3] F. Van Harmelen, A. Ten Teije, A boxology of design patterns for hybrid learning and reasoning systems, Journal of Web Engineering 18(1–3)(2019)97–123,URL https://doi.org/10.13052/jwe1540-9589.18133.

[4] M. Van Bekkum, M. De Boer, F. Van Harmelen, A. Meyer-Vitali, A.T. Teije, Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns and use cases, Appl. Intell. 51 (9) (2021) 6528–6546, http://dx.doi.org/10.1007/s10489-021-02394-3,URL https://link.springer.com/10.1007/s10489- 021-02394-3.

[5] Seinfeld, S.; Feuchtner, T.; Maselli, A.; Müller, J. User representations in human-computer interaction. Hum.–Comput. Interact. 2021, 36, 400–438. DOI:10.1080/07370024.2020.1724790.

[6] Wang, H. Landscape design of coastal area based on virtual reality technology and intelligent algorithm. J. Intell. Fuzzy Syst. 2019, 37, 5955–5963. DOI:10.3233/JIFS-179177.

[7] Wangsa, K.; Karim, S.; Gide, E.; Elkhodr, M. A Systematic Review and Comprehensive Analysis of Pioneering AI Chatbot Models from Education to Healthcare: ChatGPT, Bard, Llama, Ernie and Grok. Future Internet 2024, 16, 219. https://doi.org/10.3390/fi16070219.

[8] ]Kush, J.C. Integrating Sensor Technologies with Conversational AI: Enhancing Context-Sensitive Interaction Through Real-Time Data Fusion. Sensors 2025, 25, 249. https://doi.org/10.3390/s25010249.

[9] Liu G, Bono CA, Pierri F. 2025. Comparing diversity, negativity, and stereotypes in Chinese-language AI technologies: an investigation of Baidu, Ernie and Qwen. PeerJ Computer Science 11:e2694 https://doi.org/10.7717/peerj-cs.2694.

[10] J. Oh et al., "An Energy-Efficient High Resolution Vision Transformer Processor Exploiting Token Similarity Beyond Token Merging," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 34, no. 1, pp. 118-129, Jan. 2026, doi: 10.1109/TVLSI.2025.3604745.

[11] Syed Mushhad Mustuzhar Gilani, Muhammad Usman, Saqib Daud, Asif Kabir, Qamar Nawaz, Oláh Judit.SDN based multi level framework for smart home services.Multimedia Tools and Applications . (2024) 83:327–347.https://doi.org/10.1007/s11042-023-15678-2.

[12] G. Yin, Y. Liu, T. Liu, H. Zhang, F. Fang, C. Tang, L. Jiang. Token-disentangling mutual transformer for multimodal emotion recognition. Eng. Appl. Artif. Intell., 133 (2024), https://doi.org/10.1016/j.engappai.2024.108348.

[13] Guo S, Wang Q. Application of Knowledge Distillation Based on Transfer Learning of ERNIE Model in Intelligent Dialogue Intention Recognition. Sensors. 2022; 22(3):1270. https://doi.org/10.3390/s22031270.

[14] Zhang, JX., Tao, CQ., Huang, ZQ. et al. Discovering API Directives from API Specifications with Text Classification. J. Comput. Sci. Technol. 36, 922–943 (2021). https://doi.org/10.1007/s11390-021-0235-1.

[15] Pang Z, Wang Q, Wang Y, Gong Z. A Novel Intelligent Rebound Hammer System Based on Internet of Things. Micromachines. 2023; 14(1):148. https://doi.org/10.3390/mi14010148.

[16] Li, Z., Peng, J., Lin, X. et al. Multimodal intent recognition based on text-guided cross-modal attention. Appl Intell 55, 690 (2025). https://doi.org/10.1007/s10489-025-06583-2.

[17] K Peng., Y Hu., H Ding.et al., "Large-Scale Service Mesh Orchestration With Probabilistic Routing in Cloud Data Centers," in IEEE Transactions on Services Computing, vol. 18, no. 2, pp. 868-882, March-April 2025, doi: 10.1109/TSC.2025.3526373.

[18] Zhu, J., Li, Q., Ying, S. et al. Research on Parallel Task Scheduling Algorithm of SaaS Platform Based on Dynamic Adaptive Particle Swarm Optimization in Cloud Service Environment. Int J Comput Intell Syst 17, 260 (2024). https://doi.org/10.1007/s44196-024-00666-7.

[19] Roldán-Gómez, J., Carrillo-Mondéjar, J., Castelo Gómez, J.M., Ruiz-Villafranca, S.: Security analysis of the mqtt-sn protocol for the internet of things. Appl. Sci. 12(21), 10991 (2022).

[20] Stangaciu, V., Stangaciu, C., Gusita, B. et al. Integrating Real-Time Wireless Sensor Networks into IoT Using MQTT-SN. J Netw Syst Manage 33, 37 (2025). https://doi.org/10.1007/s10922-025-09916-1.

[21] Roldán-Gómez, J., Carrillo-Mondéjar, J., Gómez, J.M.C., Martínez, J.L.M.: Security assessment of the mqtt-sn protocol for the internet of things. J. Phys: Conf. Ser. 2224(1), 012079 (2022). https://doi.org/10.1088/1742-6596/2224/1/012079.

[22] Thingnes, T. R., & Meland, P. H. (2025). Security Challenges for Users of Extensible Smart Home Hubs: A Systematic Literature Review. Future Internet, 17(6). https://doi.org/10.3390/fi17060238.

Downloads

Published

31-03-2026

Issue

Section

Scheduling optimization and load balancing in scalable distributed systems

How to Cite

1.
Guo Zhanmiao, Zhongli Q. Building an Intelligent Home Perception System Based on Multi-Modal Information Interaction. EAI Endorsed Scal Inf Syst [Internet]. 2026 Mar. 31 [cited 2026 Apr. 9];12(8). Available from: https://publications.eai.eu/index.php/sis/article/view/10349