Transforming Data with Ontology and Word Embedding for an Efficient Classification Framework
DOI:
https://doi.org/10.4108/eetinis.v10i2.2726Keywords:
ontology, Onto2Vec, Doc2Vec, ClassificationAbstract
Transforming data into appropriate formats is crucial because it can speed up the training process and enhance the performance of classification algorithms. It is, however, challenging due to the complicated process, resource-intensive and preserved meaning of the data. This study proposes new approaches to building knowledge representation models using word-embedding and ontology techniques, which can transform text data into digital data and still keep semantic/context information of themselves in order to enhance modeling data later. To evaluate the effectiveness of the built models, a classification framework is proposed and performed on a public real dataset. Experimental results show that the constructed knowledge representation models contribute significantly to the performance of classification methods.
Downloads
References
He, Xiangnan, and Liao, Lizi and Zhang, Hanwang and Nie, Liqiang and Hu, Xia and Chua, Tat-Seng Neural Collaborative Filtering,2017, pp.173–182 (https://doi.org/10.1145/3038912.3052569) DOI: https://doi.org/10.1145/3038912.3052569
Huang, Jin, and Zhao, Wayne Xin and Dou, Hongjian and Wen, Ji-Rong and Chang, Edward Y. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks, 2018, pp.505–514,(https://doi.org/10.1145/3209978.3210017) DOI: https://doi.org/10.1145/3209978.3210017
Nguyen, Thi Thanh Sang, and Do, Pham Minh Thu Classification optimization for training a large dataset with Naïve Bayes. Journal of Combinatorial Optimization, 2020, https://doi.org/10.1007/s10878-020-00578-0. DOI: https://doi.org/10.1007/s10878-020-00578-0
Mikolov, Tomas, and Sutskever, Ilya and Chen, Kai and Corrado, Greg and Dean, Jeffrey Distributed Representa-tions of Words and Phrases and their Compositionality. Pro-ceedings of the 26th International Conference on Neural Information Processing Systems. Neural and Information Processing System (NIPS), 2014,pp.3111-3119
Nguyen, Thi Thanh Sang. Model-Based Book Recommender Systems using Naive Bayes enhanced with Optimal Feature Selection.2019, pp.217-222. DOI: https://doi.org/10.1145/3316615.3316727
Alhejaili, Abdullah and Shaheen, Fatima. Latent Feature Modelling for Recommender Systems. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI). 2020, pp. 349-356 DOI: https://doi.org/10.1109/IRI49571.2020.00057
Rong, Xin word2vec Parameter Learning Explained. CoRR,2014, http://arxiv.org/abs/1411.2738
Morin, Frederic, and Bengio, Yoshua Hierarchical Probabilistic Neural Network Language Model. 2005, http://www.gatsby.ucl.ac.uk/aistats/fullpapers/208.pdf
Mikolov, Tomas, and Yih, Wen-tau and Zweig, Geof-frey.Linguistic Regularities in Continuous Space Word Rep-resentations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp.746-751
Le, Quoc, and Mikolov, Tomas Distributed representa-tions of sentences and documents, 2014 pp.II-1188-II-1196
Antoniou, Grigoris and Harmelen, Frank van A Semantic Web Primer, MIT Press, 2008.
Antoniou, Grigoris and Harmelen, Frank van, Hand-book on Ontologies,Springer-Verlag Berlin Heidelberg, 2009. Web Ontology Language: OWL. , pp. 91-110, . DOI: https://doi.org/10.1007/978-3-540-92673-3_4
Fayyoumi, Ebaa and Sahar, Idwan. Semantic Partitioning and Machine Learning in Sentiment Analysis. Data. 6 (2021) DOI: https://doi.org/10.3390/data6060067
Smaili, Fatima Zohra and Gao, Xin and Hoehndorf, Robert Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics, 2018, pp. i52-i60. https://doi.org/10.1093/bioinformatics/bty259 DOI: https://doi.org/10.1093/bioinformatics/bty259
Frank, Eibe and Hall, Mark A. and Witten, Ian H.,. The WEKA Workbench. Online Appendix for "Data Mining:Practical Machine Learning Tools and Techniques". 4th ed. Morgan Kaufmann, 2016.
Quinlan, J. RossC4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993, pp.302
Han, Jiawei, and Kamber, Micheline and Pei, Jian. Data Mining. 3rd ed. Morgan Kaufmann, 2012.Chapter9 - Classification: Advanced Methods., pp. 393-442 DOI: https://doi.org/10.1016/B978-0-12-381479-1.00009-5
Freund, Yoav and Schapire, Robert E. (1996) Experi-ments with a new boosting algorithm.
Trojahn, Cassia, Renata, Vieira, Daniela, Schmidt, Adam, Pease and Giancarlo, Guizzardi. Foundational ontologies meet ontology matching: A survey. Semantic Web Preprint, 2021 pp.1-20.
Refaeilzadeh, Payam and Tang, Lei and Liu, Huan. Encyclopedia of Database Systems. Springer US,2009. Cross-Validation. , pp. 532-538. DOI: https://doi.org/10.1007/978-0-387-39940-9_565
Witten, Ian H. and GFrank, Eibe and Hall, Mark A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Morgan Kaufmann, 2011 Chapter 5 -Credibility: Evaluating What’s Been Learned. , pp. i147-187. DOI: https://doi.org/10.1016/B978-0-12-374856-0.00005-5
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.
Funding data
-
Viet Nam National University Ho Chi Minh City
Grant numbers C2021-28-06