A Multifaceted Approach at Discerning Redditors Feelings Towards ChatGPT


  • Shreyansh Padarha CHRIST (Deemed to be University) Pune Lavas, India
  • S. Vijaylakshmi CHRIST (Deemed to be University) Pune Lavas, India




Opinion Mining, Topic Modelling, Generative AI, Multi-Stage Sampling, Multiple Hypothesis Testing


Generative AI platforms like ChatGPT have leapfrogged in terms of technological advancements. Traditional methods of scrutiny are not enough for assessing their technological efficacy. Understanding public sentiment and feelings towards ChatGPT is crucial for pre-empting the technology’s longevity and impact while also providing a silhouette of human psychology. Social media platforms have seen tremendous growth in recent years, resulting in a surge of user-generated content. Among these platforms, Reddit stands out as a forum for users to engage in discussions on various topics, including Generative Artificial Intelligence (GAI) and chatbots. Traditional pedagogy for social media sentiment analysis and opinion mining are time consuming and resource heavy, while lacking representation. This paper provides a novice multifrontal approach that utilises and integrates various techniques for better results. The data collection and preparation are done through the Reddit API in tandem with multi-stage weighted and stratified sampling. NLP (Natural Language processing) techniques encompassing LDA (Latent Dirichlet Allocation), Topic modelling, STM (Structured Topic Modelling), sentiment analysis and emotional analysis using RoBERTa are deployed for opinion mining. To verify, substantiate and scrutinise all variables in the dataset, multiple hypothesises are tested using ANOVA, T-tests, Kruskal–Wallis test, Chi-Square Test and Mann–Whitney U test. The study provides a novel contribution to the growing literature on social media sentiment analysis and has significant new implications for discerning user experience and engagement with AI chatbots like ChatGPT.


Download data is not yet available.
<br data-mce-bogus="1"> <br data-mce-bogus="1">


O'Keeffe GS, Clarke-Pearson K. The Impact of Social Media on Children, Adolescents, and Families. Pediatrics 2011;127:800–4. https://doi.org/10.1542/peds.2011-0054. DOI: https://doi.org/10.1542/peds.2011-0054

Thukral S, Meisheri H, Kataria T, Agarwal A, Verma I, Chatterjee A, et al. Analyzing Behavioral Trends in Community Driven Discussion Platforms Like Reddit. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE; 2018. https://doi.org/10.1109/asonam.2018.8508687. DOI: https://doi.org/10.1109/ASONAM.2018.8508687

Wang F-Y, Miao Q, Li X, Wang X, Lin Y. What Does ChatGPT Say: The DAO from Algorithmic Intelligence to Linguistic Intelligence. IEEE/CAA Journal of Automatica Sinica 2023;10:575–9. https://doi.org/10.1109/jas.2023.123486. DOI: https://doi.org/10.1109/JAS.2023.123486

Izak M, Mansell S, Fuller T. Introduction: Between no future and business-as-usual: Exploring futures of capitalism. Futures 2015;68:1–4. https://doi.org/10.1016/j.futures.2015.03.006. DOI: https://doi.org/10.1016/j.futures.2015.03.006

Olhede SC, Wolfe PJ. The growing ubiquity of algorithms in society: implications, impacts and innovations. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2018;376:20170364. https://doi.org/10.1098/rsta.2017.0364. DOI: https://doi.org/10.1098/rsta.2017.0364

Schepman A, Rodway P. Initial validation of the general attitudes towards Artificial Intelligence Scale. Computers in Human Behavior Reports 2020;1:100014. https://doi.org/10.1016/j.chbr.2020.100014. DOI: https://doi.org/10.1016/j.chbr.2020.100014

Hacker P, Engel A, Mauer M. Regulating ChatGPT and other Large Generative AI Models 2023. https://doi.org/10.48550/ARXIV.2302.02337. DOI: https://doi.org/10.1145/3593013.3594067

Berthelot J-M, Latouche M. Improving the Efficiency of Data Collection: A Generic Respondent Follow-up Strategy for Economic Surveys. Journal of Business &amp Economic Statistics 1993;11:417. https://doi.org/10.2307/1391632. DOI: https://doi.org/10.2307/1391632

Borgi T, Zoghlami N, Abed M, Naceur MS. Big Data for Operational Efficiency of Transport and Logistics: A Review. 2017 6th IEEE International Conference on Advanced Logistics and Transport (ICALT), IEEE; 2017. https://doi.org/10.1109/icadlt.2017.8547029. DOI: https://doi.org/10.1109/ICAdLT.2017.8547029

Dewi LC, Meiliana, Chandra A. Social Media Web Scraping using Social Media Developers API and Regex. Procedia Comput Sci 2019;157:444–9. https://doi.org/10.1016/j.procs.2019.08.237. DOI: https://doi.org/10.1016/j.procs.2019.08.237

Web Scraping: Applications and Scraping Tools. International Journal of Advanced Trends in Computer Science and Engineering 2020;9:8202–6. https://doi.org/10.30534/ijatcse/2020/185952020. DOI: https://doi.org/10.30534/ijatcse/2020/185952020

Krotov V, Silva L. Legality and Ethics of Web Scraping, 2018.

reddit inc. Reddit API Documentation Overview n.d.

Baeza-Yates R. Bias on the web. Commun ACM 2018;61:54–61. https://doi.org/10.1145/3209581. DOI: https://doi.org/10.1145/3209581

Colleoni E, Rozza A, Arvidsson A. Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data. Journal of Communication 2014;64:317–32. https://doi.org/10.1111/jcom.12084. DOI: https://doi.org/10.1111/jcom.12084

Lawrence E, Sides J, Farrell H. Self-Segregation or Deliberation? Blog Readership, Participation, and Polarization in American Politics. Perspectives on Politics 2010;8:141–57. https://doi.org/10.1017/s1537592709992714. DOI: https://doi.org/10.1017/S1537592709992714

Couper MP. The Future of Modes of Data Collection. Public Opin Q 2011;75:889–908. https://doi.org/10.1093/poq/nfr046. DOI: https://doi.org/10.1093/poq/nfr046

Cochran WG. Sampling Techniques: 3d Ed. Wiley; 1977.

Fuller WA. Sampling Statistics. New York: Wiley; 2011.

Sedgwick P. Multistage sampling. BMJ 2015:h4155. https://doi.org/10.1136/bmj.h4155. DOI: https://doi.org/10.1136/bmj.h4155

Marshall AW. The use of multi-stage sampling schemes in Monte Carlo computations. 1954.

Kuno E. Multi-stage sampling for population estimation. Popul Ecol 1976;18:39–56. https://doi.org/10.1007/bf02754081. DOI: https://doi.org/10.1007/BF02754081

Wang J, Ge G, Fan Y, Chen L, Liu S, Jin Y, et al. The estimation of sample size in multi-stage sampling and its application in medical survey. Appl Math Comput 2006;178:239–49. https://doi.org/10.1016/j.amc.2005.11.043. DOI: https://doi.org/10.1016/j.amc.2005.11.043

Xia W, Ma C, Liu J, Liu S, Chen F, Yang Z, et al. High-Resolution Remote Sensing Imagery Classification of Imbalanced Data Using Multistage Sampling Method and Deep Neural Networks. Remote Sens (Basel) 2019;11:2523. https://doi.org/10.3390/rs11212523. DOI: https://doi.org/10.3390/rs11212523

Gualdi G, Prati A, Cucchiara R. Multi-stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos. Computer Vision ECCV 2010, Springer Berlin Heidelberg; 2010, p. 196–209. https://doi.org/10.1007/978-3-642-15567-3_15. DOI: https://doi.org/10.1007/978-3-642-15567-3_15

Hankin DG, Mohr MS, Newman KB. Multi-stage sampling. Sampling Theory, Oxford University PressOxford; 2019, p. 173–99. https://doi.org/10.1093/oso/9780198815792.003.0009. DOI: https://doi.org/10.1093/oso/9780198815792.003.0009

Qian L, Zhou G, Kong F, Zhu Q. Semi-supervised learning for semantic relation classification using stratified sampling strategy. Proceedings of the 2009 conference on empirical methods in natural language processing, 2009, p. 1437–45. DOI: https://doi.org/10.3115/1699648.1699690

Shi X, Xiao Y. Modeling multi-mapping relations for precise cross-lingual entity alignment. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 813–22. DOI: https://doi.org/10.18653/v1/D19-1075

Efraimidis P, Spirakis P. Weighted Random Sampling. Encyclopedia of Algorithms, Springer US; 2008, p. 1024–7. https://doi.org/10.1007/978-0-387-30162-4_478. DOI: https://doi.org/10.1007/978-0-387-30162-4_478

WINSHIP C, RADBILL L. Sampling Weights and Regression Analysis. Sociological Methods &amp Research 1994;23:230–57. https://doi.org/10.1177/0049124194023002004. DOI: https://doi.org/10.1177/0049124194023002004

Skinner CJ. Probability Proportional to Size (scpPPS/scp) Sampling 2016:1–5. https://doi.org/10.1002/9781118445112.stat03346.pub2. DOI: https://doi.org/10.1002/9781118445112.stat03346.pub2

Parsons VL. Stratified Sampling 2017:1–11. https://doi.org/10.1002/9781118445112.stat05999.pub2. DOI: https://doi.org/10.1002/9781118445112.stat05999.pub2

Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 2014;5:1093–113. https://doi.org/10.1016/j.asej.2014.04.011. DOI: https://doi.org/10.1016/j.asej.2014.04.011

Nasukawa T, Yi J. Sentiment analysis. Proceedings of the 2nd international conference on Knowledge capture, ACM; 2003. https://doi.org/10.1145/945645.945658. DOI: https://doi.org/10.1145/945645.945658

Melton CA, Olusanya OA, Ammar N, Shaban-Nejad A. Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: A call to action for strengthening vaccine confidence. J Infect Public Health 2021;14:1505–12. https://doi.org/10.1016/j.jiph.2021.08.010. DOI: https://doi.org/10.1016/j.jiph.2021.08.010

Chong WY, Selvaretnam B, Soon L-K. Natural Language Processing for Sentiment Analysis: An Exploratory Analysis on Tweets. 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, IEEE; 2014. https://doi.org/10.1109/icaiet.2014.43. DOI: https://doi.org/10.1109/ICAIET.2014.43

Troussas C, Virvou M, Espinosa KJ, Llaguno K, Caro J. Sentiment analysis of Facebook statuses using Naive Bayes classifier for language learning. IISA 2013, IEEE; 2013. https://doi.org/10.1109/iisa.2013.6623713. DOI: https://doi.org/10.1109/IISA.2013.6623713

Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018. https://doi.org/10.48550/ARXIV.1810.04805.

Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019. https://doi.org/10.48550/ARXIV.1907.11692.

Tarunesh I, Aditya S, Choudhury M. Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task 2021. https://doi.org/10.48550/ARXIV.2107.07229.

Hutto C, Gilbert E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media 2014;8:216–25. https://doi.org/10.1609/icwsm.v8i1.14550. DOI: https://doi.org/10.1609/icwsm.v8i1.14550

Shah SMA, Singh S. Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers. Innovations in Computer Science and Engineering, Springer Nature Singapore; 2023, p. 221–37. https://doi.org/10.1007/978-981-19-7455-7_17. DOI: https://doi.org/10.1007/978-981-19-7455-7_17

Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research 2003;3:993–1022.

Blei DM. Probabilistic topic models. Commun ACM 2012;55:77–84. DOI: https://doi.org/10.1145/2133806.2133826

Jockers ML, Mimno D. Significant themes in 19th-century literature. Poetics 2013;41:750–69. https://doi.org/10.1016/j.poetic.2013.08.005. DOI: https://doi.org/10.1016/j.poetic.2013.08.005

Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 2018;78:15169–211. https://doi.org/10.1007/s11042-018-6894-4. DOI: https://doi.org/10.1007/s11042-018-6894-4

de Finetti. Theory of Probability. vol. 1–2. Chichester: John Wiley & Sons Ltd.; 1990.

Blei D, Lafferty J. Correlated topic models. Adv Neural Inf Process Syst 2006;18:147.

Roberts ME, Stewart BM, Tingley D, Airoldi EM, others. The structural topic model and applied social science. Advances in neural information processing systems workshop on topic models: computation, application, and evaluation, vol. 4, 2013, p. 1–20. DOI: https://doi.org/10.32614/CRAN.package.stm

Berg-Kirkpatrick T, Burkett D, Klein D. An empirical investigation of statistical significance in NLP. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012, p. 995–1005.

Dror R, Baumer G, Shlomov S, Reichart R. The hitchhiker’s guide to testing statistical significance in natural language processing. Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: Long papers), 2018, p. 1383–92. DOI: https://doi.org/10.18653/v1/P18-1128

Hussein M, Özyurt F. A new technique for sentiment analysis system based on deep learning using Chi-Square feature selection methods. Balkan Journal of Electrical and Computer Engineering 2021;9:320–6. DOI: https://doi.org/10.17694/bajece.887339

Vargha A, Delaney HD. The Kruskal-Wallis Test and Stochastic Homogeneity. Journal of Educational and Behavioral Statistics 1998;23:170–92. https://doi.org/10.3102/10769986023002170. DOI: https://doi.org/10.3102/10769986023002170

Futschik A, Taus T, Zehetmayer S. An omnibus test for the global null hypothesis. Stat Methods Med Res 2019;28:2292–304. DOI: https://doi.org/10.1177/0962280218768326

Shaffer JP. Multiple Hypothesis Testing. Annu Rev Psychol 1995;46:561–84. https://doi.org/10.1146/annurev.ps.46.020195.003021. DOI: https://doi.org/10.1146/annurev.ps.46.020195.003021

Barbieri F, Camacho-Collados J, Neves L, Espinosa-Anke L. Tweeteval: Unified benchmark and comparative evaluation for tweet classification. ArXiv Preprint ArXiv:201012421 2020. DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.148

Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv Preprint ArXiv:191003771 2019. DOI: https://doi.org/10.18653/v1/2020.emnlp-demos.6

Sievert C, Shirley K. LDAvis: A method for visualizing and interpreting topics. Proceedings of the workshop on interactive language learning, visualization, and interfaces, 2014, p. 63–70. DOI: https://doi.org/10.3115/v1/W14-3110




How to Cite

S. Padarha and S. Vijaylakshmi, “A Multifaceted Approach at Discerning Redditors Feelings Towards ChatGPT”, EAI Endorsed Trans IoT, vol. 10, Jun. 2024.