Optimization of Deep Learning-Based Patent Infringement Text Comparison and Retrieval Algorithms

Authors

  • Juan Zou International Business School, Guangdong University of Finance & Economics

DOI:

https://doi.org/10.4108/eetsis.12346

Keywords:

deep learning, patent infringement, text comparison, retrieval algorithm optimization, LDA topic model, COV-BiGRU twin neural network

Abstract

INTRODUCTION: Patent texts may present challenges in accurately identifying potential patent infringement risks due to subtle differences in technical features. OBJECTIVES: Deep learning effectively models the complex semantic structures and local feature correlations within patent texts by integrating deep semantic representations and deep matching mechanisms, thereby enhancing sensitivity to minute technical variations and improving decision support for infringement risk assessment. OBJECTIVES: this study explores deep learning-based optimization methods for patent infringement risk text comparison and retrieval algorithms. A macro-level patent text comparison and retrieval layer is constructed by integrating Word2vec and LDA topic models. This layer models patent text semantics, generates word-level semantic vectors for patents, and enables rapid comparison and retrieval of candidate patent sets related to the target patent at the technical feature semantic level. A COV-BiGRU twin neural network is introduced as the fine-grained comparison and retrieval layer. This layer optimizes the macro-level retrieval algorithm by performing granular semantic matching and Manhattan distance calculations between candidate patent texts and the target patent text. Based on these results, patent infringement risk retrieval are retrieved. RESULTS: Results demonstrate that this method reduces the candidate patent text set size to 0.457% of the full database during macro-level retrieval. CONCLUSION: In the fine-grained retrieval optimization phase, it achieves 100% recall of known high-risk patents while effectively excluding non-infringing patents that share similar technical themes but differ in specific technical features. This validates the preliminary effectiveness of the optimized dual-layer retrieval algorithm for texts with fine-grained differences on the tested patent corpus. This validates the preliminary effectiveness of the optimized dual-layer retrieval algorithm for texts with fine-grained differences on the tested patent corpus.

 

References

[1] Zhang Z, Chen S, Huang J, Ma J. Zero-shot defect detection with anomaly attribute awareness via textual domain bridge. IEEE Sens J. 2025; 25(7):11759-11771.

[2] Nam S, Jang C, Kim S. A corpus-based study on the vocabulary development of korean learners. J Inf Process Syst. 2024; 20(4):477-490.

[3] Mahalle VS, Kandoi NM, Patil SB. A powerful method for interactive content-based image retrieval by variable compressed convolutional info neural networks. Vis Comput. 2024; 40(8):5259-5285.

[4] Feldman WB. Patent thickets and product hops: challenges and opportunities for legislative reform. J Law Med Ethics. 2025; 53(1):158-163.

[5] Rinaldi AM, Russo C, Tommasino C. A semantic approach for cultural heritage ontology matching and integration based on textual and multimedia information. Soft Comput. 2025; 29(2):1019-1034.

[6] Bosco PJ, Janakiraman S. Retrieving similar images: using triclr+cc-cdlbp features extraction algorithm. Int J Comput Appl Technol. 2025; 76(1-2):115-132.

[7] Tuerhong G, Dai X, Tian L, Wushouer M. An end-to-end image-text matching approach considering semantic uncertainty. Neurocomputing. 2024; 607:128386.

[8] Kumar R, Sharma SC. Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval. J Supercomput. 2023; 79(2):2251-2280.

[9] Bouguila KMMAN. Novel mixture allocation models for topic learning. Comput Intell. 2024; 40(2):e12641.

[10] Eliguzel N, Cetinkaya C, Dereli T. Comparative analysis with topic modeling and word embedding methods after the aegean sea earthquake on twitter. Evolving Syst. 2023; 14(2):245-261.

[11] Peng C, Wei X. Algorithm for mining topic terms in multiple databases based on LDA topic model. Comput Simul. 2023; 40(8):483-487.

[12] Yu D, Xiang B. Discovering topics and trends in the field of Artificial Intelligence: Using LDA topic modeling. Expert Syst Appl. 2023; 225:120114.

[13] Tu H. Online text retrieval method based on convolution neural network. J Mult-Valued Log Soft Comput. 2024; 42(1/3):159-177.

[14] Goyal A, Gupta V, Kumar M. Recurrent neural network-based model for named entity recognition with improved word embeddings. IETE J Res. 2023; 69(10):6970-6976.

[15] Maragheh HK, Gharehchopogh FS, Sangar AB. A hybrid model based on convolutional neural network and long short-term memory for multi-label text classification. Neural Process Lett. 2024; 56(2):1-31.

[16] Ravi J, Kulkarni S. Text embedding techniques for efficient clustering of twitter data. Evol Intell. 2023; 16(5):1667-1677.

[17] Verma G, Sahu TP. Deep label relevance and label ambiguity based multi-label feature selection for text classification. Eng Appl Artif Intell. 2025; 148:110403.

[18] Feng X. Web crawling algorithm fusing TF-IDF and Word2Vec feature extraction. J Web Eng. 2025; 24(5):713-738.

[19] Mersha MA, Yigezu MG, Kalita J. Semantic-driven topic modeling using transformer-based embeddings and clustering algorithms. Procedia Comput Sci. 2024; 244:121-132.

[20] Verma S, Sharan A, Malik N. Efficient classification of hallmark of cancer using embedding-based support vector machine for multilabel text. New Gener Comput. 2024; 42(4):685-714.

[21] George M, Murugesan R. Improving sentiment analysis of financial news headlines using hybrid word2vec-tfidf feature extraction technique. Procedia Comput Sci. 2024; 244:1-8.

[22] Hashemi S, Maentylae M. Onelog: towards end-to-end software log anomaly detection. Autom Softw Eng. 2024; 31(2):37.

[23] Kim J, Choi T, Yoon S, Sull S. Unsupervised video anomaly detection based on similarity with predefined text descriptions. Sensors. 2023; 23(14):6256.

[24] Liu L, Perez-Concha O, Jorm BL. Automated icd coding using extreme multi-label long text transformer-based models. Artif Intell Med. 2023; 144:102662.

[25] Meng C, Todo Y, Tang C, Luan L, Tang Z. Mflsci: multi-granularity fusion and label semantic correlation information for multi-label legal text classification. Eng Appl Artif Intell. 2025; 139:109604.

[26] Jonathan S, Sucar LE. Semi-supervised hierarchical multi-label classifier based on local information. Int J Approx Reason. 2025; 181:109411.

[27] Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020; 36(4):1234-1240.

[28] Haghighian Roudsari A, Afshar J, Lee W, Lee S. PatentNet: multi-label classification of patent documents using deep learning based language understanding. Scientometrics. 2022; 127(1):207-231.

[29] Yang X, Wang Z, Wang Q, Wei K, Zhang K, Shi J. Large language models for automated q&a involving legal documents: a survey on algorithms, frameworks and applications. Int J Web Inf Syst. 2024; 20(4):413-435.

[30] Kamil M, Çakır D. Advances in transformer-based semantic search: Techniques, benchmarks, and future directions. Turk J Math Comput Sci. 2025; 17(1):145-166.

Downloads

Published

30-06-2026

Issue

Section

Data Security and Privacy Protection in New Distributed Networks and System

How to Cite

1.
Zou J. Optimization of Deep Learning-Based Patent Infringement Text Comparison and Retrieval Algorithms. EAI Endorsed Scal Inf Syst [Internet]. 2026 Jun. 30 [cited 2026 Jul. 2];12(12). Available from: https://publications.eai.eu/index.php/sis/article/view/12346