Chinese Online Violent Speech Detection Based on EBLA

Hongliang Wang; Shoumin Zhang; Na Li; Jing Liu; Peng Zhang

doi:10.4108/eetsis.10318

Authors

Hongliang Wang China People's Police University
Shoumin Zhang Hebei Key Laboratory of Information Support Technology for Smart Policing , China People's Police University
Na Li Guangzhou Immigration Border Inspection General Station
Jing Liu North China Institute of Aerospace
Peng Zhang China People's Police University , Hebei Key Laboratory of Information Support Technology for Smart Policing https://orcid.org/0009-0005-2930-6956

DOI:

https://doi.org/10.4108/eetsis.10318

Keywords:

Internet public opinion management, online violent speech detection, Text classification, ERNIE, Attention mechanism

Abstract

INTRODUCTION: The Internet's features of transcending time and space and anonymity have fostered more rampant and covert online violent speech. Thus, accurate and effective management of online public opinion is of great significance. In recent years, scholars both domestically and internationally have conducted extensive research on online violent speech detection, but current challenges include extracting semantics from diverse and implicit expressions in Chinese online violent short texts.

OBJECTIVES: This paper aims to propose the EBLA model for online violent speech detection, based on the ERNIE knowledge-enhanced semantic understanding pre-training model and the BiLSTM-Attention network, to precisely identify relevant textual semantic information and provide an effective method for online content moderators.

METHODS: The model is trained using publicly available Chinese datasets related to online violence. It enhances deep, sentence-level feature extraction by integrating an attention mechanism into the BiLSTM layer on top of the ERNIE pre-training model. The model consists of vector transformation, deep text feature extraction, and text classification prediction phases.

RESULTS: Results show that the precision of this model in identifying Chinese online violence tasks surpasses the BERT pre-training model by 3.7% and outperforms the BiLSTM combined with the attention mechanism by 13.84%. Empirical studies on additional datasets confirm the model's robustness and transferability.

CONCLUSION: The EBLA model provides a strong basis for online violent speech detection, though it has limitations such as not accounting for identity bias or dynamic speech nature. Future improvements will focus on multimodal analysis and dynamic monitoring capabilities.

References

[1] Cyberspace Administration of China [EB/OL]. 2024 (2024-03-15). http://www.cac.gov.cn/2024-03/15/c_1712088026696264.htm

[2] Zhou Y. Research on the Criminal Regulation of Cyber Violence. Internet World, 2021(02): 32-36.

[3] Schmidt A, Wiegand M. A survey on hate speech detection using natural language processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 2017: 1–10.

[4] Fortuna P, Nunes S. A survey on automatic detection of hate speech in text. ACM Computing Surveys, 2018, 51(4): 1–30.

[5] Kiritchenko S, Nejadgholi I, Fraser K. Confronting abusive language online: A survey from the ethical and human rights perspective. Journal of Artificial Intelligence Research, 2021, 71: 431-478. DOI:10.1613/jair.1.12590.

[6] Alrashidi B, Jamal A, Khan I, et al. A review on abusive content automatic detection: approaches, challenges and opportunities. PeerJ Computer Science, 2022, 8: e1142. DOI:10.7717/peerj-cs.1142.

[7] Davidson T, Warmsley D, Macy M, Weber I. Automated hate speech detection and the problem of offensive language. Proceedings of ICWSM, 2017: 512–515.

[8] Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R. Predicting the type and target of offensive posts in social media. Proceedings of NAACL-HLT, 2019: 1415–1420.

[9] MacAvaney S, Yao H, Yang E, Russell K, Goharian N, Frieder O. Hate speech detection: Challenges and solutions. PloS One, 2019, 14(8): e0221152.

[10] Wu H, Pan S. Research on recognition of Chinese illegal comments based on BERT-RCNN. Journal of Chinese Information Processing, 2022, 36(1): 92-103.

[11] Chen W. Research on online sarcasm detection technology. National University of Defense Technology, 2022.

[12] Nie L. Research on the discovery model of harmful online vocabulary based on AlphaGo design ideas. Central China Normal University, 2018. DOI: CNKI:CDMD:2.1018.233722.

[13] Deng J, Zhang J, Hou H, et al. COLD: A Benchmark for Chinese Offensive Language Detection. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022: 11580–11599. https://aclanthology.org/2022.emnlp-main.796/.

[14] Junyu L, Bo X, Xiaokun Z, Changrong M, Liang Y, Hongfei L, et al. Facilitating fine-grained detection of Chinese toxic language: Hierarchical taxonomy, resources, and benchmarks. Annual Meeting of the Association for Computational Linguistics, 2023, abs/2305.04446: 16235-16250.

[15] Warner D, Bhattacharya D, Walker M. Detecting attacker intent in online discussions with application to Internet predators. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2012: 187-196.

[16] Wang A, Pruksachatkun Y, Nangia N, et al. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in Neural Information Processing Systems, 2019, 32.

[17] Zhou X, Fan X, Yang Y, et al. Unhealthy language detection based on semantic spelling understanding and gated attention mechanism. Computer Applications and Software, 2024, 41(01): 112-118+125.

[18] Yan S, Wang J, Zhu S, et al. Research on internet sensitive language recognition with fusion of character and word features. Computer Engineering and Applications, 2023, 59(13): 129-138.

[19] Chen D, Ma J, Ma Z, et al. A survey of natural language processing pre-training technology. Computer Science and Exploration, 2021, 15(08): 1359-1389.

[20] Yu T, Jin R, Han X, et al. A review of natural language processing pre-trained models. Computer Engineering and Applications, 2020, 56(23): 12-22.

[21] Xu L, Hu Y, Pan Z. A survey of bias research against large language models. Computer Application Research: 1-14 [2024-05-05].https://doi.org/10.19734/j.issn.1001-3695.2024.02.0020.

[22] thu-coai. roberta-base-cold[EB/OL]. Hugging Face, 2022. (2025-07-20). https://huggingface.co/thu-coai/roberta-base-cold.

[23] Aiqi J, Xiaohan Y, Yang L, Arkaitz Z, et al. SWSR: A Chinese dataset and lexicon for online sexism detection. Online Social Networks and Media, 2022, 27: 100182.

[24] PaddlePaddle. ERNIE[EB/OL]. GitHub, 2019. (2025-07-20). https://github.com/PaddlePaddle/ERNIE.

Chinese Online Violent Speech Detection Based on EBLA

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission