Enhancing Medical Question-Answering Systems with Knowledge Graph-Integrated Large Language Models: A Comparative Analysis

Authors

DOI:

https://doi.org/10.4108/eetpht.11.11670

Keywords:

Knowledge Graph Prompt Engineering, Medical QA, Large Language Models, SPARQL, Wikidata

Abstract

This study investigates the impact of integrating knowledge graph prompt engineering (KGPE) with large language models in the context of medical question answering. The Hugging Face MedQA dataset (N = 5,000) was utilised for the extraction of key medical entities via the implementation of named entity recognition, and the construction of SPARQL-based relational prompts from the knowledge base of Wikipedia to guide the reasoning process. Two models, Llama-2-7B-chat-hf and Qwen-2-7B-Instruct, are evaluated through a weighted aggregation of BLEU, ROUGE, and cosine similarity metrics. The findings demonstrate that Qwen-2-7B-Instruct attains substantial enhancements under KGPE—BLEU escalating from 0.366 to 0.531 (+0.165) and cosine similarity rising from 0.763 to 0.820 (+0.057). Conversely, Llama-2-7B-chat-hf exhibits a modest decrease, signifying divergent responsiveness to structured knowledge. These findings demonstrate that integrating structured knowledge through KGPE enhances factual accuracy and semantic coherence in medical reasoning without modifying model architecture

Downloads

Download data is not yet available.

References

[1] Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., & Wen, J. (2023). A Survey of Large Language Models. ArXiv, abs/2303.18223.

[2] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Barnes, N., & Mian, A.S. (2023). A Comprehensive Overview of Large Language Models. ArXiv, abs/2307.06435.

[3] Kamalloo, E., Dziri, N., Clarke, C.L., & Rafiei, D. (2023). Evaluating Open-Domain Question Answering in the Era of Large Language Models. ArXiv, abs/2305.06984.

[4] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Barnes, N., & Mian, A.S. (2023). A Comprehensive Overview of Large Language Models. ArXiv, abs/2307.06435.

[5] Lai, V.D., Ngo, N.T., Veyseh, A.P., Man, H., Dernoncourt, F., Bui, T., & Nguyen, T.H. (2023). ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning. ArXiv, abs/2304.05613.

[6] Thirunavukarasu, A.J., Ting, D.S., Elangovan, K., Gutierrez, L., Tan, T.F., & Ting, D.S. (2023). Large language models in medicine. Nature Medicine, 29, 1930-1940.

[7] Borkowski, A.A., Jakey, C.E., Mastorides, S.M., Kraus, A.L., Vidyarthi, G., Viswanadhan, N.A., & Lezama, J.L. (2023). Applications of ChatGPT and Large Language Models in Medicine and Health Care: Benefits and Pitfalls. Federal practitioner : for the health care professionals of the VA, DoD, and PHS, 40 6, 170-173 .

[8] Omiye, J.A., Gui, H., Rezaei, S.J., Zou, J., & Daneshjou, R. (2023). Large Language Models in Medicine: The Potentials and Pitfalls. Annals of Internal Medicine, 177, 210 - 220.

[9] Wang, Y. (2024). Application of large language models based on knowledge graphs in question-answering systems: A review. Applied and Computational Engineering.

[10] Yang, Y., Li, K., Yan, Y., & Zhu, J. (2022). Research on the Development Process and Construction of Domain-specific Knowledge Graph. 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), 708-711.

[11] Khetan, V., AnnervazK, M., Wetherley, E.B., Eneva, E., Sengupta, S., & Fano, A.E. (2021). Knowledge Graph Anchored Information-Extraction for Domain-Specific Insights. ArXiv, abs/2104.08936.

[12] Remy, F., Demuynck, K., & Demeester, T. (2023). BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights. ArXiv, abs/2311.16075.

[13] Sun, J., Xu, C., Tang, L., Wang, S., Lin, C., Gong, Y., Ni, L.M., Shum, H., & Guo, J. (2023). Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. International Conference on Learning Representations.

[14] Gao, Y., Li, R., Croxford, E., Tesch, S., To, D., Caskey, J., Patterson, B.W., Churpek, M.M., Miller, T., Dligach, D., & Afshar, M. (2023). Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction. medRxiv.

[15] Yang, J. (2024). Integrated Application of LLM Model and Knowledge Graph in Medical Text Mining and Knowledge Extraction. Social Medicine and Health Management.

[16] Wang, Y., Jiang, B., Luo, Y., He, D., Cheng, P., & Gao, L. (2024). Reasoning on Efficient Knowledge Paths: Knowledge Graph Guides Large Language Model for Domain Question Answering. ArXiv, abs/2404.10384.

[17] Yang, R., Liu, H., Marrese-Taylor, E., Zeng, Q., Ke, Y.H., Li, W., Cheng, L., Chen, Q., Caverlee, J., Matsuo, Y., & Li, I. (2024). KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques. ArXiv, abs/2403.05881.

[18] Wang, Y., Lipka, N., Rossi, R.A., Siu, A.F., Zhang, R., & Derr, T. (2023). Knowledge Graph Prompting for Multi-Document Question Answering. AAAI Conference on Artificial Intelligence.

[19] Baek, J., Aji, A., & Saffari, A. (2023). Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering. Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023).

[20] Soman, K., Rose, P.W., Morris, J.H., Akbas, R.E., Smith, B., Peetoom, B., Villouta-Reyes, C., Cerono, G., Shi, Y., Rizk-Jackson, A., Israni, S., Nelson, C.A., Huang, S., & Baranzini, S. (2023). Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics, 40.

[21] Haidar, M.A., & Kurimo, M. (2017). LDA-based context dependent recurrent neural network language model using document-based topic distribution of words. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5730-5734.

[22] Venugopalan, M., & Gupta, D. (2022). An enhanced guided LDA model augmented with BERT based semantic strength for aspect term extraction in sentiment analysis. Knowl. Based Syst., 246, 108668.

[23] Al-Besher, A., Kumar, K., Sangeetha, M., & Butsa, T. (2022). BERT for Conversational Question Answering Systems Using Semantic Similarity Estimation. Computers, Materials & Continua.

[24] Mäntylä, M., Claes, M., & Farooq, U. (2018). Measuring LDA topic stability from clusters of replicated runs. Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.

[25] Riza Rizky, L.M., & Suyanto, S. (2021). Improving Stance-based Fake News Detection using BERT Model with Synonym Replacement and Random Swap Data Augmentation Technique. 2021 IEEE 7th Information Technology International Seminar (ITIS), 1-6.

[26] Lan, F. (2022). Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method. Advances in Multimedia.

[27] Dahir, S., Elhassouni, J., Qadi, A.E., & Bennis, H. (2021). Medical Query Expansion using Semantic Sources DBpedia and Wikidata. International Symposium on Intelligent Control.

[28] Spaulding, E., Conger, K., Gershman, A., Uceda-Sosa, R., Brown, S.W., Pustejovsky, J., Anick, P., & Palmer, M. (2023). The DARPA Wikidata Overlay: Wikidata as an ontology for natural language processing. International Symposium on Algorithms.

[29] Nguyen, P., & Takeda, H. (2022). Wikidata-lite for Knowledge Extraction and Exploration. 2022 IEEE International Conference on Big Data (Big Data), 3684-3686.

[30] Albade, J.V., & Salisbury, J.P. (2022). Social Media Event Detection Using Spacy Named Entity Recognition and Spectral Embeddings. World Congress on Electrical Engineering and Computer Systems and Science.

[31] Bian, J., Zheng, J., Zhang, Y., & Zhu, S. (2023). Inspire the Large Language Model by External Knowledge on BioMedical Named Entity Recognition. ArXiv, abs/2309.12278.

[32] Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wang, S., Bai, S., Tan, S., Zhu, T., Li, T., Liu, T., Ge, W., Deng, X., Zhou, X., Ren, X., Zhang, X., Wei, X., Ren, X., Fan, Y., Yao, Y., Zhang, Y., Wan, Y., Chu, Y., Cui, Z., Zhang, Z., & Fan, Z. (2024). Qwen2 Technical Report. ArXiv, abs/2407.10671.

[33] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. ArXiv, abs/2302.13971.

[34] Touvron, H., Martin, L., Stone, K.R., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D.M., Blecher, L., Ferrer, C.C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A.S., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I.M., Korenev, A.V., Koura, P.S., Lachaux, M., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E.M., Subramanian, R., Tan, X., Tang, B., Taylor, R., Williams, A., Kuan, J.X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., & Scialom, T. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv, abs/2307.09288.

[35] Wang, C., Hua, M., Song, J., & Tang, X. (2023). Knowledge Graphs Enhanced Large Language Model Prompt for Electric Power Question Answering. Proceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering.

[36] Rabby, G., Auer, S., D'Souza, J., & Oelen, A. (2024). Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization.

[37] Wang, K., Xu, Y., Wu, Z., & Luo, S. (2024). LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs. Annual Meeting of the Association for Computational Linguistics.

[38] Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., & Le, Q.V. (2021). Finetuned Language Models Are Zero-Shot Learners. ArXiv, abs/2109.01652.

[39] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.

[40] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. ArXiv, abs/2205.11916.

[41] Kim, S., Joo, S.J., Kim, D., Jang, J., Ye, S., Shin, J., & Seo, M. (2023). The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning. ArXiv, abs/2305.14045.

[42] Banwasi, A., Sun, X., Ravindranath, R., & Vazquez, M. (2023). Self Evaluation Using Zero-shot Learning. 2023 5th International Conference on Robotics and Computer Vision (ICRCV), 278-282.

[43] Wan, X., Sun, R., Dai, H., Arik, S.Ö., & Pfister, T. (2023). Better Zero-Shot Reasoning with Self-Adaptive Prompting. ArXiv, abs/2305.14106.

[44] Antonucci, A., Piqu'e, G., & Zaffalon, M. (2023). Zero-shot Causal Graph Extrapolation from Text via LLMs. ArXiv, abs/2312.14670.

[45] Zhang, Q., Dong, J., Chen, H., Zha, D., Yu, Z., & Huang, X. (2023). KnowGPT: Knowledge Graph based Prompting for Large Language Models.

[46] Khatun, R., & Sinhababu, N. (2023). Improved Sequence Predictions using Knowledge Graph Embedding for Large Language Models. Proceedings of the Third International Conference on AI-ML Systems.

[47] Chepurova, A., Kuratov, Y., Bulatov, A., & Burtsev, M. (2024). Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification. Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing.

[48] Yao, L., Peng, J., Mao, C., & Luo, Y. (2023). Exploring Large Language Models for Knowledge Graph Completion. ArXiv, abs/2308.13916.

[49] Qian, C., Zhao, X., & Wu, S.T. (2023). "Merge Conflicts!" Exploring the Impacts of External Distractors to Parametric Knowledge Graphs. ArXiv, abs/2309.08594.

Downloads

Published

28-01-2026

How to Cite

1.
Lin J, Ouyang S. Enhancing Medical Question-Answering Systems with Knowledge Graph-Integrated Large Language Models: A Comparative Analysis. EAI Endorsed Trans Perv Health Tech [Internet]. 2026 Jan. 28 [cited 2026 Jan. 28];11. Available from: https://publications.eai.eu/index.php/phat/article/view/11670