Opinion Mining with Density Forests
Keywords:DBSCAN Clustering, Density Forests, Opinion mining, Hotel Reviews, Restaurant Reviews
In this paper, we propose a new approach for opinion mining with density-based forests. We apply Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify clusters of data points in a space of feature vectors that are important features of hotel and restaurant reviews, and then use the clusters to construct random forests to classify whether the opinions expressed about features in the reviews are positive or negative. Our experiment uses two standard datasets of hotel and restaurant reviews in two different scenarios. The experimental results show the effectiveness of our proposed
Ester, M., Kriegel, H., Sander, J., Xu, X.. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). ACM. 2016; 226–231.
Liu, B., and Zhang, L. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, Boston, MA. 2012; 415-463. DOI: https://doi.org/10.1007/978-1-4614-3223-4_13
Breiman, L.. Random forests. Mach. Learn. 2001; 45, 5–32. DOI: https://doi.org/10.1023/A:1010933404324
Hahsler, M., Piekenbrock, M., Doran, D.. dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software. 2019; 91(1), 1–30. DOI: https://doi.org/10.18637/jss.v091.i01
Liu, B..Sentiment Analysis: Mining Sentiments, Opinions, and Emotions. 2nd edn. Cambridge University Press, Cambridge. 2020. DOI: https://doi.org/10.1017/9781108639286
Weng, S., Gou, J., Fan, Z.. h-DBSCAN: A simple fast DBSCAN algorithm for big data. In Proceedings of Machine Learning Research 157, 2021.
Zhou, Z.-H. Ensemble Learning: Foundations and Algorithms. Electronic Industry Press: Beijing, China, 2020.
Phuc Quang Tran, Ngoan Thanh Trieu, Nguyen Vu Dao, Hai Thanh Nguyen and Hiep Xuan Huynh. Effective Opinion Words Extraction for Food Reviews Classification. International Journal of Advanced Computer Science and Applications(IJACSA). 2020; 11(7). DOI: https://doi.org/10.14569/IJACSA.2020.0110755
Hongwei Wen, Hanyuan Hang: Random Forest Density Estimation. In Proceedings of the 39th International Conference on Machine Learning, PMLR 162:23701-23722, 2022.
Dong, J. and Qian, Q.. A Density-Based Random Forest for Imbalanced Data Classification. Future Internet. 2022;14(90). DOI: https://doi.org/10.3390/fi14030090
Hang, Hanyuan, Cai, Yuchao and Yang, Hanfang: Density-based Clustering with Best-scored Random Forest. FOS: Computer and information sciences. 2019.
Breiman, L., Friedman, J. H., Olshen, R. A., et al:. Classification and Regression Trees. CA: Wadsworth . 1984.
Zhang, X: Gaussian Distribution. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. 2011. DOI: https://doi.org/10.1007/978-0-387-30164-8_323
Phuc Quang Tran, Hai Thanh Nguyen, Hanh My Thi Le, and Hiep Xuan Huynh. Ensemble Learning for Mining Opinions on Food Reviews. In Proceedings of the International Conference on Context-Aware Systems and Applications(ICCASA2021). 2021; pp 56–70. DOI: https://doi.org/10.1007/978-3-030-93179-7_5
Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. What Yelp Fake Review Filter Might Be Doing. In Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), Boston, USA. 2013.
How to Cite
Copyright (c) 2023 EAI Endorsed Transactions on Context-aware Systems and Applications
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.