Multimodal Sentiment Analysis in Natural Disaster Data on Social Media

Sefa Dursun; Süleyman Eken

doi:10.4108/eetsc.5860

Authors

Sefa Dursun Kocaeli Üniversitesi
Süleyman Eken Kocaeli Üniversitesi

DOI:

https://doi.org/10.4108/eetsc.5860

Keywords:

Multimodal Learning, Sentiment Analysis, Natural Disaster, Natural Language Processing, Image Processing

Abstract

INTRODUCTION: With the development of the Internet, users tend to express their opinions and emotions through text, visual and/or audio content. This has increased the interest in multimodal analysis methods.
OBJECTIVES: This study addresses multimodal sentiment analysis on tweets related to natural disasters by combining textual and visual embeddings.
METHODS: The use of textual representations together with the emotional expressions of the visual content provides a more comprehensive analysis. To investigate the impact of high-level visual and texual features, a three-layer neural network is used in the study, where the first two layers collect features from different modalities and the third layer is used to analyze sentiments.
RESULTS: According to experimental tests on our dataset, the highest performance values (77% Accuracy, 71% F1-score) are achieved by using the CLIP model in the image and the RoBERTa model in the text.
CONCLUSION: Such analyzes can be used in different application areas such as agencies, advertising, social/digital media content producers, humanitarian aid organizations and can provide important information in terms of social awareness.

Downloads

References

[1] Chandrasekaran, G., Nguyen, T.N. and Hemanth D, J. (2021) Multimodal sentimental analysis for social media applications: A comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11(5): e1415.

[2] Yavuz, A. and Eken, S. (2023) Gold returns prediction: Assessment based on major events. EAI Endorsed Transactions on Scalable Information Systems 10(5).

[3] Balta Kaç, S. and Eken, S. (2023) Customer complaintsbased water quality analysis. Water 15(18): 3171.

[4] Yurtsever, M.M.E., Shiraz, M., Ekinci, E. and Eken, S. (2023) Comparing covid-19 vaccine passports attitudes across countries by analysing reddit comments. Journal of Information Science : 01655515221148356.

[5] Yurtsever, M.M.E., Ekinci, E. and Eken, S. (2023) Covid-19 and behavioral analytics: Deep learning-based work-from-home sensing from reddit comments. In International Conference on Computing, Intelligence and Data Analytics (Springer): 143–155.

[6] Köroğlu, F.E., Çakmak, S., Yurtsever, M.M.E. and Eken, S. (2024) Smart waste management: A case study on garbage container detection. In 6th Mediterranean Conference on Pattern Recognition and Artificial Intelligence, MedPRAI 2024, İstanbul, Türkiye, October 18-19, 2024, Proceedings (Springer).

[7] Medhat, W., Hassan, A. and Korashy, H. (2014) Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal 5(4): 1093–1113.

[8] Alam, F., Ofli, F. and Imran, M. (2018) Crisismmd: Multimodal twitter datasets from natural disasters. In Proceedings of the international AAAI conference on web and social media, 12.

[9] Kaç, S.B., Eken, S., Balta, D.D., Balta, M., İskefiyeli, M. and Özçelik, İ. (2024) Image-based security techniques for water critical infrastructure surveillance. Applied Soft Computing 161: 111730.

[10] Jayagopal, A., Aiswarya, A.M., Garg, A. and Nandakumar, S.K. (2022) Multimodal representation learning with text and images. arXiv preprint arXiv:2205.00142 .

[11] Zhang, Y., Jiang, H., Miura, Y., Manning, C.D. and Langlotz, C.P. (2022) Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference (PMLR): 2–25.

[12] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G. et al. (2021) Learning transferable visual models from natural language supervision. In International conference on machine learning (PMLR): 8748–8763.

[13] Lin, Z., Bas, E., Singh, K.Y., Swaminathan, G. and Bhotika, R. (2023) Relaxing contrastiveness in multimodal representation learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision: 2227–2236.

[14] Cheema, G.S., Hakimov, S., Müller-Budack, E. and Ewerth, R. (2021) A fair and comprehensive comparison of multimodal tweet sentiment analysis methods. In Proceedings of the 2021 Workshop on Multi-Modal Pre- Training for Multimedia Understanding: 37–45.

[15] Thelwall, M., Buckley, K., Paltoglou, G., Cai, D. and Kappas, A. (2010) Sentiment strength detection in short informal text. Journal of the American society for information science and technology 61(12): 2544–2558.

[16] Saif, H., Bashevoy, M., Taylor, S., Fernandez, M. and Alani, H. (2016) Senticircles: A platform for contextual and conceptual sentiment analysis. In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29–June 2, 2016, Revised Selected Papers 13 (Springer): 140–145.

[17] You, Q., Luo, J., Jin, H. and Yang, J. (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In Proceedings of the AAAI conference on Artificial Intelligence, 29.

[18] You, Q., Luo, J., Jin, H. and Yang, J. (2016) Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In Proceedings of the AAAI conference on artificial intelligence, 30.

[19] Jiang, T., Wang, J., Liu, Z. and Ling, Y. (2020) Fusionextraction network for multimodal sentiment analysis. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II 24 (Springer): 785–797.

[20] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z. et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115: 211–252.

[21] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition: 770–778.

[22] Pennington, J., Socher, R. and Manning, C.D. (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP): 1532–1543.

[23] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 .

[24] Bica, M., Palen, L. and Bopp, C. (2017) Visual representations of disaster. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing: 1262–1276.

[25] Sit, M.A., Koylu, C. and Demir, I. (2020) Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: a case study of hurricane irma. In Social Sensing and Big Data Computing for Disaster Management (Routledge), 8–32.

[26] Hassan, S.Z., Ahmad, K., Hicks, S., Halvorsen, P., Al- Fuqaha, A., Conci, N. and Riegler, M. (2022) Visual sentiment analysis from disaster images in social media. Sensors 22(10): 3628.

[27] Hassan, S.Z., Ahmad, K., Riegler, M.A., Hicks, S., Conci, N., Halvorsen, P. and Al-Fuqaha, A. (2021) Visual sentiment analysis: A natural disasteruse-case task at mediaeval 2021. arXiv preprint arXiv:2111.11471.

[28] Niu, T., Zhu, S., Pang, L. and El Saddik, A. (2016) Sentiment analysis on multi-view social data. In MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22 (Springer): 15–27.

[29] Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. and Torralba, A. (2017) Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence 40(6): 1452–1464.

[30] Siqueira, H., Magg, S. and Wermter, S. (2020) Efficient facial feature learning with wide ensemblebased convolutional neural networks. In Proceedings of the AAAI conference on artificial intelligence, 34: 5800–5809.

[31] Yi, D., Lei, Z. and Li, S.Z. (2015) Shared representation learning for heterogenous face recognition. In 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG) (IEEE), 1: 1–7.

[32] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O. et al. (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 .

[33] Loria, S. et al. (2018) textblob documentation. Release 0.15 2(8): 269.