Deep Learning Approaches for English-Marathi Code-Switched Detection

Authors

  • Shreyash Bhimanwar COEP Technicological University
  • Onkar Viralekar COEP Technicological University
  • Koustubh Anturkar COEP Technicological University
  • Ashwini Kulkarni COEP Technicological University

DOI:

https://doi.org/10.4108/eetsis.3972

Keywords:

Code-Switching, Deep Learning, Log-Mel Spectogram, Long Short-Term Memory, LSTM, Mel Frequency Cepstral Coefficients, MFCC, Neural Network, Perpetual Linear Prediction, PLP, Spoken Language Identification, Speech Recognition

Abstract

During a conversation, speakers in multilingual societies frequently switch between two or more spoken languages. A linguistic action known as "code-switching" particularly alters or merges two or more languages. The development of software or tools for detecting code-switching has received very little attention. This paper proposes a Deep Learning based methods for detecting code-switched English-Marathi data. These suggested methods can be applied to various applications, including phone call merging, Intelligent AI assistants, Intelligent travelling systems to assist travellers in navigation and reservations, call centres to handle customer service issues, etc. To create a system for code switch detection, our study demonstrates a detailed analysis of extracting several audio features such as the Mel-Spectrogram, Mel-frequency Cepstral Coefficient (MFCC), and Perceptual Linear Predictive coefficients (PLP). Our team's English-Marathi code-switched dataset served as the testing ground for our methodologies. Our model's accuracy was 92.99%, with 40 MFCC coefficients having energy coefficient serving as the zeroth coefficient.

References

“Code-switching detection using multilingual DNNS” :E. Yılmaz, H. van den Heuvel and D. van Leeuwen, 2016 IEEE Spoken Language Technology Workshop(SLT), 2016, pp. 610-616

"Exploiting spectral augmentation for code-switched spoken language identification": Rangan, Pradeep, Sundeep Teki, and Hemant Misra.

"Language identification using deep convolutional recurrent neural networks": Bartz, Christian, Tom Herold, Haojin Yang, and Christoph Meinel.

"Performance Evaluation of Conventional and Hybrid Feature Extractions Using Multivariate HMM Classifier": International Journal of Engineering Research and Applications 5, no. 4 (2015): 96-101. Këpuska, Veton Z., and Hussien A. Elharati.

Speech Recognition — Feature Extraction MFCC & PLP. [online] Medium. Hui, J., 2022 Available at:

https://jonathan-hui.medium.com/speech-recognition-feature-extraction-mfcc-plp-5455f5a69dd9.

"Audio augmentation for speech recognition": In Sixteenth annual conference of the international speech communication association. 2015. Ko, Tom, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur.

"Spoken Language Identification Using Deep Learning": Computational Intelligence and Neuroscience 2021 (2021). Singh, Gundeep, Sahil Sharma, Vijay Kumar, Manjit Kaur, Mohammed Baz, and Mehedi Masud.

"Performance Evaluation of Conventional and Hybrid Feature Extractions Using Multivariate HMM Classifier": International Journal of Engineering Research and Applications 5, no. 4 (2015): 96-101. Këpuska, Veton Z., and Hussien A. Elharati.

“Long short-term memory”: Neural Computation, vol. 9, no. 8, pp. 1735– 1780, Nov. 1997. S. Hochreiter and J. Schmidhuber.

"Learning to forget: Continual prediction with LSTM": Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000. F. A. Gers, J. Schmidhuber, and F. Cummins.

"Learning precise timing with LSTM recurrent networks": Journal of Machine Learning Research, vol. 3, pp. F. A. Gers, N. N. Schraudolph, and J. Schmidhube. 115– 143, Mar. 2003.

"Hybrid speech recognition with deep bidirectional LSTM": in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 273–278. A. Graves, N. Jaitly, and A. Mohamed.

"Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition": ArXiveprints, Feb. 2014 H. Sak, A. Senior, and F. Beaufays.

"A study on data augmentation of reverberant speech for robust speech recognition": 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5220-5224 T. Ko, V. Peddinti, D. Povey, M. L. Seltzer and S. Khudanpur.

“Perceptual linear predictive (PLP) analysis of speech, the Journal of the Acoustical Society”: H. Hermansky.

"Bidirectional RNN": Version 6, April 30. Accessed 2022-02-15. https://devopedia.org/bidirectional-rnn Devopedia. 2020.

“Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors.” Sensors. 19. 3481. 10.3390/s19163481. Cabral, Frederico&Fukai, Hidekazu& Tamura, Satoshi. (2019).

"Kapre: On-gpu audio preprocessing layers for a quick implementation of deep neural network models with keras": arXiv preprint arXiv:1706.05781 (2017). Choi, Keunwoo, DeokjinJoo, and Juho Kim.

"A review into deep learning techniques for spoken language identification. Multimed Tools Appl”: 81, 32593–32624 (2022). Thukroo, I.A., Bashir, R. & Giri, K.J.

"Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching": Rallabandi, SaiKrishna and Sitaram, Sunayana and Black, Alan W (2018).

“A Survey of Code-switched Speech and Language Processing”: CoRR vol. abs/1904.00784, 2019. Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, and Alan W. Black.

Downloads

Published

25-09-2023

How to Cite

1.
Bhimanwar S, Viralekar O, Anturkar K, Kulkarni A. Deep Learning Approaches for English-Marathi Code-Switched Detection. EAI Endorsed Scal Inf Syst [Internet]. 2023 Sep. 25 [cited 2024 Sep. 13];11(3). Available from: https://publications.eai.eu/index.php/sis/article/view/3972