Supervised Urdu Word Segmentation Model Based on POS Information

Sadiq Nawaz Khan; Khairullah Khan; Wahab Khan

doi:10.4108/eai.19-6-2018.155444

Supervised Urdu Word Segmentation Model Based on POS Information

Authors

Sadiq Nawaz Khan University of Science and Technology Bannu
Khairullah Khan University of Science and Technology Bannu
Wahab Khan Isles International University

DOI:

https://doi.org/10.4108/eai.19-6-2018.155444

Keywords:

Urdu, Word segmentation, supervised learning, conditional random fields

Abstract

Urdu is the national language of Pakistan, also the most widely spoken and understandable language of the globe. In order to accomplish successful Urdu NLP a robust and high-performance NLP tools and resources are utmost necessary. Word segmentation takes on an authoritative role for morphologically rich languages such as Urdu for diverse NLP domains such as named entity recognition, sentiment analysis, part of speech tagging, information retrieval etc. The morphological richness property of Urdu adds to the challenges of the word segmentation task, because a single word can be composed of null or a few prefixes, a stem and null or a few suffixes. In this paper we present supervised Urdu word segmentation scheme based on part of speech (POS) information of the corresponding words. For experiments conditional random fields (CRF) with contextual feature is used. The performance of the proposed system is evaluated on 300K words, results shows evidential improvements on baseline approach.

References

Downloads

Published

10-09-2018

Issue

Vol. 5 No. 19 (2018): EAI Endorsed Transactions on Scalable Information Systems

Section

Research articles

License

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

How to Cite

Khan SN, Khan K, Khan W. Supervised Urdu Word Segmentation Model Based on POS Information. EAI Endorsed Scal Inf Syst [Internet]. 2018 Sep. 10 [cited 2026 Jul. 25];5(19):e2. Available from: https://publications.eai.eu/index.php/sis/article/view/2185

Download Citation

Supervised Urdu Word Segmentation Model Based on POS Information

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission