Drug classification system based on drug composition and usage instructions

Hoang-Dieu Vu; Vu-Hien Pham; Quang-Dung Le

doi:10.4108/eetinis.v12i1.5995

Authors

Hoang-Dieu Vu Phenikaa University
Vu-Hien Pham Phenikaa University
Quang-Dung Le Phenikaa University

DOI:

https://doi.org/10.4108/eetinis.v12i1.5995

Keywords:

Drug classification, NLP, Drug Composition

Abstract

This study presents a natural language processing (NLP) approach to classify drugs based on compositional and usage descriptions. NLP techniques including text preprocessing, word embedding, and deep learning models were applied to a Vietnamese drug dataset. Traditional machine learning models like Support Vector Machines (SVM) and deep models including Bidirectional Long Short-Term Memory (BiLSTM) and PhoBERT were evaluated. Besides, since there is a limitation in the information of our own collected data, some data augmentation techniques were applied to increase the variation of the dataset. Results show PhoBERT achieving 95% accuracy, highlighting the benefits of transferring knowledge from large language models. Errors primarily occurred between similar drug categories, suggesting taxonomy refinement could improve performance. In summary, an automated drug classification framework was developed leveraging state-of- the-art NLP, validating the feasibility of analyzing drug data at scale and aiding therapeutic understanding. This supports NLP’s potential in pharmacovigilance applications.

Downloads

Download data is not yet available.

References

[1] Elizabeth D Liddy. “Natural language processing”. In: (2001).

[2] Asma Ben Abacha et al. “Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug–drug interaction extraction and classification”. In: Journal of biomedical informatics 58 (2015), pp. 122–132.

[3] Maria-Dorinela Dascalu et al. “Intelligent platform for the analysis of drug leaflets using NLP techniques”. In: 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet). IEEE. 2019, pp. 1–6.

[4] Ning Liu, Cheng-Bang Chen, and Soundar Kumara. “Semi-supervised learning algorithm for identifying high-priority drug–drug interactions through adverse event reports”. In: IEEE journal of biomedical and health informatics 24.1 (2019), pp. 57–68.

[5] Miguel Vazquez et al. “Text mining for drugs and chemical compounds: methods, tools and applications”. In: Molecular Informatics 30.6-7 (2011), pp. 506–519.

[6] Huu Nguyen Phat and Nguyen Thi Minh Anh. “Vietnamese text classification algorithm using long short term memory and Word2Vec”. In: 19.6 (2020), pp. 1255–1279.

[7] Quan-Hoang Vo et al. “Multi-channel LSTM-CNN model for Vietnamese sentiment analysis”. In: 2017 9th international conference on knowledge and systems engineering (KSE). IEEE. 2017, pp. 24–29.

[8] Sepp Hochreiter and Jürgen Schmidhuber. “Long shortterm memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.

[9] Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE transactions on Signal Processing 45.11 (1997), pp. 2673–2681.

[10] Dat Quoc Nguyen and Anh Tuan Nguyen. “PhoBERT: Pre-trained language models for Vietnamese”. In: Findings of EMNLP (2020).

[11] Liu Zhuang et al. “A robustly optimized BERT pretraining approach with post-training”. In: Proceedings of the 20th chinese national conference on computational linguistics. 2021, pp. 1218–1227.

[12] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).

[13] Dat Quoc Nguyen and Anh Tuan Nguyen. “PhoBERT: Pre-trained language models for Vietnamese”. In: arXiv preprint arXiv:2003.00744 (2020).

[14] Cu Vinh Loc et al. “A Text Classification for Vietnamese Feedback via PhoBERT-Based Deep Learning”. In: Proceedings of Seventh International Congress on Information and Communication Technology: ICICT 2022, London, Volume 3. Springer. 2022, pp. 259–272. [15] Vu Cong Duy Hoang et al. “A comparative study on vietnamese text classification methods”. In: 2007 IEEE international conference on research, innovation and vision for the future. IEEE. 2007, pp. 267–273.

[16] Son T Luu, Kiet Van Nguyen, and Ngan Luu-Thuy Nguyen. “Empirical study of text augmentation on social media text in vietnamese”. In: arXiv preprint arXiv:2009.12319 (2020).

[17] Huu Nguyen Phat and Nguyen Thi Minh Anh. “Vietnamese text classification algorithm using long short term memory and Word2Vec”. In: 19.6 (2020), pp. 1255–1279.

[18] To Nguyen Phuoc Vinh and Ha Hoang Kha. “Vietnamese news articles classification using neural networks”. In: Journal of Advances in Information Technology (JAIT) (2021).

[19] Toan Pham Van and Ta Minh Thanh. “Vietnamese news classification based on BoW with keywords extraction and neural network”. In: 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES). IEEE. 2017, pp. 43–48.

[20] Guojie Yang et al. “Interoperability and data storage in internet of multimedia things: investigating current trends, research challenges and future directions”. In: IEEE Access 8 (2020), pp. 124382–124401.

Drug classification system based on drug composition and usage instructions

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

Categories

License

How to Cite

Most read articles by the same author(s)

Make a Submission

Scopus_CiteScore

Latest publications