Transforming Data with Ontology and Word Embedding for an Efficient Classification Framework

Thi Thanh Sang Nguyen; Pham Minh  Thu Do; Thanh Tuan Nguyen; Thanh Tho Quan

doi:10.4108/eetinis.v10i2.2726

Authors

Thi Thanh Sang Nguyen School of Computer Science and Engineering, International University, VNU-HCMC, Hochiminh City, Vietnam
Pham Minh Thu Do School of Computer Science and Engineering, International University, VNU-HCMC, Hochiminh City, Vietnam
Thanh Tuan Nguyen University of Greenwich
Thanh Tho Quan Ho Chi Minh City University of Technology

DOI:

https://doi.org/10.4108/eetinis.v10i2.2726

Keywords:

ontology, Onto2Vec, Doc2Vec, Classification

Abstract

Transforming data into appropriate formats is crucial because it can speed up the training process and enhance the performance of classification algorithms. It is, however, challenging due to the complicated process, resource-intensive and preserved meaning of the data. This study proposes new approaches to building knowledge representation models using word-embedding and ontology techniques, which can transform text data into digital data and still keep semantic/context information of themselves in order to enhance modeling data later. To evaluate the effectiveness of the built models, a classification framework is proposed and performed on a public real dataset. Experimental results show that the constructed knowledge representation models contribute significantly to the performance of classification methods.

Downloads

Captures

Readers: 1

-

see details

References

He, Xiangnan, and Liao, Lizi and Zhang, Hanwang and Nie, Liqiang and Hu, Xia and Chua, Tat-Seng Neural Collaborative Filtering,2017, pp.173–182 (https://doi.org/10.1145/3038912.3052569) DOI: https://doi.org/10.1145/3038912.3052569

Huang, Jin, and Zhao, Wayne Xin and Dou, Hongjian and Wen, Ji-Rong and Chang, Edward Y. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks, 2018, pp.505–514,(https://doi.org/10.1145/3209978.3210017) DOI: https://doi.org/10.1145/3209978.3210017

Nguyen, Thi Thanh Sang, and Do, Pham Minh Thu Classification optimization for training a large dataset with Naïve Bayes. Journal of Combinatorial Optimization, 2020, https://doi.org/10.1007/s10878-020-00578-0. DOI: https://doi.org/10.1007/s10878-020-00578-0

Mikolov, Tomas, and Sutskever, Ilya and Chen, Kai and Corrado, Greg and Dean, Jeffrey Distributed Representa-tions of Words and Phrases and their Compositionality. Pro-ceedings of the 26th International Conference on Neural Information Processing Systems. Neural and Information Processing System (NIPS), 2014,pp.3111-3119

Nguyen, Thi Thanh Sang. Model-Based Book Recommender Systems using Naive Bayes enhanced with Optimal Feature Selection.2019, pp.217-222. DOI: https://doi.org/10.1145/3316615.3316727

Alhejaili, Abdullah and Shaheen, Fatima. Latent Feature Modelling for Recommender Systems. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI). 2020, pp. 349-356 DOI: https://doi.org/10.1109/IRI49571.2020.00057

Rong, Xin word2vec Parameter Learning Explained. CoRR,2014, http://arxiv.org/abs/1411.2738

Morin, Frederic, and Bengio, Yoshua Hierarchical Probabilistic Neural Network Language Model. 2005, http://www.gatsby.ucl.ac.uk/aistats/fullpapers/208.pdf

Mikolov, Tomas, and Yih, Wen-tau and Zweig, Geof-frey.Linguistic Regularities in Continuous Space Word Rep-resentations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp.746-751

Le, Quoc, and Mikolov, Tomas Distributed representa-tions of sentences and documents, 2014 pp.II-1188-II-1196

Antoniou, Grigoris and Harmelen, Frank van A Semantic Web Primer, MIT Press, 2008.

Antoniou, Grigoris and Harmelen, Frank van, Hand-book on Ontologies,Springer-Verlag Berlin Heidelberg, 2009. Web Ontology Language: OWL. , pp. 91-110, . DOI: https://doi.org/10.1007/978-3-540-92673-3_4

Fayyoumi, Ebaa and Sahar, Idwan. Semantic Partitioning and Machine Learning in Sentiment Analysis. Data. 6 (2021) DOI: https://doi.org/10.3390/data6060067

Smaili, Fatima Zohra and Gao, Xin and Hoehndorf, Robert Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics, 2018, pp. i52-i60. https://doi.org/10.1093/bioinformatics/bty259 DOI: https://doi.org/10.1093/bioinformatics/bty259

Frank, Eibe and Hall, Mark A. and Witten, Ian H.,. The WEKA Workbench. Online Appendix for "Data Mining:Practical Machine Learning Tools and Techniques". 4th ed. Morgan Kaufmann, 2016.

Quinlan, J. RossC4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993, pp.302

Han, Jiawei, and Kamber, Micheline and Pei, Jian. Data Mining. 3rd ed. Morgan Kaufmann, 2012.Chapter9 - Classification: Advanced Methods., pp. 393-442 DOI: https://doi.org/10.1016/B978-0-12-381479-1.00009-5

Freund, Yoav and Schapire, Robert E. (1996) Experi-ments with a new boosting algorithm.

Trojahn, Cassia, Renata, Vieira, Daniela, Schmidt, Adam, Pease and Giancarlo, Guizzardi. Foundational ontologies meet ontology matching: A survey. Semantic Web Preprint, 2021 pp.1-20.

Refaeilzadeh, Payam and Tang, Lei and Liu, Huan. Encyclopedia of Database Systems. Springer US,2009. Cross-Validation. , pp. 532-538. DOI: https://doi.org/10.1007/978-0-387-39940-9_565

Witten, Ian H. and GFrank, Eibe and Hall, Mark A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Morgan Kaufmann, 2011 Chapter 5 -Credibility: Evaluating What’s Been Learned. , pp. i147-187. DOI: https://doi.org/10.1016/B978-0-12-374856-0.00005-5