MFSF-CEA: A Multi-Feature Similarity Fusion Model for Chinese Entity Alignment

Authors

DOI:

https://doi.org/10.4108/eetsis.10735

Keywords:

Entity alignment, multi-feature, entity abstract, LDA

Abstract

INTRODUCTION: Entity alignment across multi-source encyclopedic knowledge bases is crucial for constructing high-quality knowledge graphs. This task is particularly challenging in specialized vertical domains like Chinese cultural relics, where heterogeneous data sources and diverse descriptive patterns render single-feature alignment methods inadequate.

OBJECTIVES: To address this challenge, we propose MFSF-CEA, a Multi-feature Similarity Fusion model featuring dual-layer optimization: multi-granularity semantic modeling and domain-adaptive dynamic weight fusion.

METHODS: This approach employs a three-tiered semantic capture structure: character-level similarity using Longest Common Subsequence for variant character matching; word-level similarity via TF-IDF for core concept association; and sentence-level semantic similarity through Latent Dirichlet Allocation for deep topic alignment. Beyond feature extraction, we introduce an Entropy-AHP combined weighting mechanism that dynamically balances objective information contribution and domain expert knowledge, overcoming limitations of fixed-weight fusion strategies. Experimental evaluation on a Chinese cultural relics dataset demonstrates that MFSF-CEA significantly outperforms baseline methods in precision, recall, and F1-score. The sentence-level contextual features contribute most substantially to alignment accuracy, while the multi-feature fusion effectively compensates for the limitations of any single feature type, particularly the sparsity of word-level abstract features.

RESULTS: The proposed framework successfully addresses the unique challenges of entity alignment in cultural relic texts by leveraging complementary features across multiple linguistic levels.

CONCLUSION: This work provides an effective and extensible solution for knowledge fusion in vertical domains, advancing entity alignment from traditional string matching toward deeper semantic integration.

References

[1] Simos MA, Makris C. Computationally efficient context-free named entity disambiguation with wikipedia. Information. 2022;13(8):367.

[2] Szymański J, Olewniczak S, Piotrowski M, Pont MTS, Mora H. Entity Annotation with Wikipedia Using Neural Networks. In: Proceedings of the International Conference on Computer Information Systems and Industrial Management; 2024 Aug; Cham. Cham: Springer Nature Switzerland; 2024. p. 272-284.

[3] Nugues P. Linking Named Entities in Diderot's Encyclopédie to Wikidata. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation; 2024 May; Torino. 2024. p. 10610-10615.

[4] Wang H, Chowdhury SMH. Wikipedia Empowered Natural Language Interface for Web Search. In: Proceedings of the International Conference on Web Information Systems Engineering; 2024 Nov; Singapore. Singapore: Springer Nature Singapore; 2024. p. 14-25.

[5] Lippolis AN, Klironomos A, Milon-Flores DF, Zheng H, Jouglar A, Norouzi E, Hogan A. Enhancing entity alignment between wikidata and artgraph using llms. In: Proceedings of the International Workshop on Semantic Web and Ontology Design for Cultural Heritage (SWODCH); 2023; Aachen: CEUR-WS; 2023. p. 1-12.

[6] Yan Z, Peng R, Wang Y, Li W. CTEA: Context and Topic Enhanced Entity Alignment for Knowledge Graphs. Neurocomputing. 2020;410:419-431.

[7] Heist N, Paulheim H. CaLiGraph: A knowledge Graph From Wikipedia categories and lists. Semantic Web. 2025;16(5):22104968251361349.

[8] Chen Y, Guo J, Hao C, Song C. Multi-Source Data Fusion Method Research on the Reconstruction and Expansion Project of Long-Line Expressway. Technical Gazette. 2025;32(1):149-156.

[9] Minardi S, Greco S, Barban N. A Comparison of Rule-based and Supervised Machine Learning Approaches for Record Linkage of Italian Historical Data. Historical Life Course Studies. 2025;15:28-46.

[10] Wang H, Zhao K, Li M, Zhang Y. Multi-modal Entity Alignment Based on Multidimensional Semantic Extraction. IEICE Transactions on Information and Systems. 2025;E108-D(5):2024EDP7173.

[11] Bai, Luyi, Song Xiuting and Lin Zhu. Joint multi-feature information entity alignment for cross-lingual temporal knowledge graph with bert. IEEE Transactions on Big Data 11.2 (2024): 345-358.

[12] Song, Hao, Yuxia Lei, Kangli Zi, Fuyuan Quan, Tianhang Zhang, and Qi Li. "An Entity Alignment Algorithm Based on Ontology Enhancement and Multi-Feature Fusion." In 2024 6th International Conference on Frontier Technologies of Information and Computer (ICFTIC), IEEE, 2024:942-948.

[13] Akhtar MU, Wang Y, Chen X, Li Z. Multilingual entity alignment by abductive knowledge reasoning on multiple knowledge graphs. Engineering Applications of Artificial Intelligence. 2025;139:109660.

[14] Zhang, Y., Chen, L., & Liu, Y. (2021). A Knowledge Graph Construction Method for Chinese Cultural Relics Based on Multi-source Data Fusion. Journal of Cultural Heritage, 48, 112–120.

[15] Li, H., & Wang, X. (2020). Entity Recognition and Alignment in Chinese Cultural Heritage Texts. In Proceedings of the 2020 International Conference on Artificial Intelligence and Cultural Heritage (AICH 2020) (pp. 45–52). Springer.

[16] Dinges A, Hinze R. Truly Functional Solutions to the Longest Uptrend Problem (Functional Pearl). Proceedings of the ACM on Programming Languages. 2025;9(ICFP):463-478.

[17] Huang J, Li T, Jia Z, et al. Entity alignment of Chinese heterogeneous encyclopedia knowledge base[J]. Journal of computational and applied, 2016, 7(36):1881–1886.

[18] Singh PK, Singh KN. A similarity-based semi-supervised algorithm for labeling unlabeled text data. Expert Systems with Applications. 2025;258:128941.

[19] Zhao L, Wang B, Gao J, Li X, Hu Y, Yin B. Multi-modal Entity in One Word: Aligning Multi-level Semantics for Multi-modal Knowledge Graph Completion. IEEE Transactions on Big Data. 2025;12(3):1234-1248.

[20] Rifaldy F, Sibaroni Y, Prasetiyowati SS. Effectiveness of Word2VEC and TF-IDF in sentiment classification on online investment platforms using support Vector Machine. Jurnal Ilmiah Penelitian dan Pembelajaran Informatika. 2025;10(2):863-874.

[21] Chen, Y., Li, K., & Zhang, M. (2019). Enhancing Short Text Representation with External Knowledge and N-gram Features for Classification. IEEE Access, 7, 160122-160131.

[22] Taher HA, Hasan NNABM. Integration Named Entity Recognition and Latent Dirichlet Allocation to Enhance Topic Modeling. Annals of Emerging Technologies in Computing. 2025;9(2):45-56.

[23] Wang L, Jiao M, Li Z, Zhang M, Wei H, Liu Y. Image Captioning Model Based on Multi-Step Cross-Attention Cross-Modal Alignment and External Commonsense Knowledge Augmentation. Electronics. 2025;14(16):3325.

[24] Lan J, Qian X. Research on Improved RBM Recommendation Algorithm Based on Gibbs Sampling. Scalable Computing: Practice and Experience. 2025;26(3):1017-1034.

Downloads

Published

08-04-2026

Issue

Section

Scheduling optimization and load balancing in scalable distributed systems

How to Cite

1.
Zhang M, Yang L, Gao Y, Han L, Caimei B. MFSF-CEA: A Multi-Feature Similarity Fusion Model for Chinese Entity Alignment. EAI Endorsed Scal Inf Syst [Internet]. 2026 Apr. 8 [cited 2026 Apr. 9];12(9). Available from: https://publications.eai.eu/index.php/sis/article/view/10735