Opinion Mining with Density Forests





DBSCAN Clustering, Density Forests, Opinion mining, Hotel Reviews, Restaurant Reviews


In this paper, we propose a new approach for opinion mining with density-based forests. We apply Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify clusters of data points in a space of feature vectors that are important features of hotel and restaurant reviews, and then use the clusters to construct random forests to classify whether the opinions expressed about features in the reviews are positive or negative. Our experiment uses two standard datasets of hotel and restaurant reviews in two different scenarios. The experimental results show the effectiveness of our proposed


Ester, M., Kriegel, H., Sander, J., Xu, X.. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). ACM. 2016; 226–231.

Liu, B., and Zhang, L. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, Boston, MA. 2012; 415-463. DOI: https://doi.org/10.1007/978-1-4614-3223-4_13

Breiman, L.. Random forests. Mach. Learn. 2001; 45, 5–32. DOI: https://doi.org/10.1023/A:1010933404324

Hahsler, M., Piekenbrock, M., Doran, D.. dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software. 2019; 91(1), 1–30. DOI: https://doi.org/10.18637/jss.v091.i01

Liu, B..Sentiment Analysis: Mining Sentiments, Opinions, and Emotions. 2nd edn. Cambridge University Press, Cambridge. 2020. DOI: https://doi.org/10.1017/9781108639286

Weng, S., Gou, J., Fan, Z.. h-DBSCAN: A simple fast DBSCAN algorithm for big data. In Proceedings of Machine Learning Research 157, 2021.

Zhou, Z.-H. Ensemble Learning: Foundations and Algorithms. Electronic Industry Press: Beijing, China, 2020.

Phuc Quang Tran, Ngoan Thanh Trieu, Nguyen Vu Dao, Hai Thanh Nguyen and Hiep Xuan Huynh. Effective Opinion Words Extraction for Food Reviews Classification. International Journal of Advanced Computer Science and Applications(IJACSA). 2020; 11(7). DOI: https://doi.org/10.14569/IJACSA.2020.0110755

Hongwei Wen, Hanyuan Hang: Random Forest Density Estimation. In Proceedings of the 39th International Conference on Machine Learning, PMLR 162:23701-23722, 2022.

Dong, J. and Qian, Q.. A Density-Based Random Forest for Imbalanced Data Classification. Future Internet. 2022;14(90). DOI: https://doi.org/10.3390/fi14030090

Hang, Hanyuan, Cai, Yuchao and Yang, Hanfang: Density-based Clustering with Best-scored Random Forest. FOS: Computer and information sciences. 2019.

Breiman, L., Friedman, J. H., Olshen, R. A., et al:. Classification and Regression Trees. CA: Wadsworth . 1984.

Zhang, X: Gaussian Distribution. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. 2011. DOI: https://doi.org/10.1007/978-0-387-30164-8_323

Phuc Quang Tran, Hai Thanh Nguyen, Hanh My Thi Le, and Hiep Xuan Huynh. Ensemble Learning for Mining Opinions on Food Reviews. In Proceedings of the International Conference on Context-Aware Systems and Applications(ICCASA2021). 2021; pp 56–70. DOI: https://doi.org/10.1007/978-3-030-93179-7_5

Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. What Yelp Fake Review Filter Might Be Doing. In Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), Boston, USA. 2013.




How to Cite

Tran PQ, Ha DNL, Le HTM, Huynh HX. Opinion Mining with Density Forests. EAI Endorsed Trans Context Aware Syst App [Internet]. 2023 Jul. 10 [cited 2024 Apr. 25];9. Available from: https://publications.eai.eu/index.php/casa/article/view/3272