A Hybrid Named Entity Recognition System for Aviation Text

Bharathi A; Robin Ramdin; Preeja Babu; Vijay Krishna Menon; Chandrasekhar Jayaramakrishnan; Sudarasan Lakshmikumar

doi:10.4108/eetsis.4185

Authors

Bharathi A KeepFlying
Robin Ramdin KeepFlying
Preeja Babu KeepFlying
Vijay Krishna Menon KeepFlying
Chandrasekhar Jayaramakrishnan KeepFlying
Sudarasan Lakshmikumar KeepFlying

DOI:

https://doi.org/10.4108/eetsis.4185

Keywords:

Named Entity Recognition, Machine Learning, Aviation Herald, Spacy NER, GPE, Rule Augmentation

Abstract

Named Entity Recognition (NER) is a crucial task in Natural Language Processing (NLP) that aims to identify and categorize named entities in text. While NER has been well-studied in various domains, it remains a challenging task in new domains where annotated data is limited. In this paper, we propose an NER system for the aviation domain that addresses this challenge. Our system combines rule-based and supervised methods to develop a model with little to no manual annotation work.We evaluate our system on a benchmark dataset and it outperforms baseline scores and achieves competitive results. To the best of our knowledge, this is the first study to develop an NER system that specifically targets aviation entities. Our findings highlight the potential of our proposed system for NER in aviation and pave the way for future research in this area.

References

Grishman, R. and Sundheim, B. (1996) Message understanding conference-6. In Proceedings of the 16th conference on Computational linguistics - (Association for Computational Linguistics). doi:10.3115/992628.992709, URL https://doi.org/10.3115/992628.992709.

Chinchor, N. and Robinson, P. (1997) Muc-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding, 29: 1–21.

Kubala, F., Schwartz, R., Stone, R. and Weischedel, R. (1998) Named entity extraction from speech. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (Citeseer): 287–292.

Mikheev, A., Moens, M. and Grover, C. (1999) Named entity recognition without gazetteers. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics - (Association for Computational Linguistics). doi:10.3115/977035.977037, URL https://doi.org/10.3115/977035.977037.

Borthwick, A.E. (1999) A maximum entropy approach to named entity recognition (New York University).

Chieu, H.L. and Ng, H.T. (2003) Named entity recognition with a maximum entropy approach. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003- (Association for Computational Linguistics).doi:10.3115/1119176.1119199, URL https://doi.org/10.3115/1119176.1119199.

Baluja, S., Mittal, V.O. and Sukthankar, R. (2000) Applying machine learning for high-performance named-entity extraction. Computational Intelligence 16(4): 586–595. doi:10.1111/0824-7935.00129, URL https://doi.org/10.1111/0824-7935.00129.

Sang, E.F.T.K. (2002) Introduction to the CoNLL- 2002 shared task. In proceeding of the 6th conference on Natural language learning - COLING-02 (Association for Computational Linguistics). doi:10.3115/1118853.1118877, URL https://doi.org/10.3115/1118853.1118877.

Sang, E.F.T.K. and Meulder, F.D. (2003) Introduction to the CoNLL-2003 shared task. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - (Association for Computational Linguistics). doi:10.3115/1119176.1119195, URL https://doi.org/10.3115/1119176.1119195.

McCallum, A. and Li, W. (2003) Early results for named entity recognition with conditional random fields, feature induction and webenhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLTNAACL 2003 - (Association for Computational Linguistics). doi:10.3115/1119176.1119206, URL https://doi.org/10.3115/1119176.1119206.

Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S. et al. (2005) Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1): 91–134. doi:10.1016/j.artint.2005.03.001, URL https://doi.org/10.1016/j.artint.2005.03.001.

Nadeau, D. (2007) Semi-supervised named entity recognition: Learning to recognize 100 entity types with little supervision doi:10.20381/RUOR-19854, URL http://ruor.uottawa.ca/handle/10393/29684.

Huang, Z., Xu, W. and Yu, K. (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 .

Ma, X. and Hovy, E. (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics). doi:10.18653/v1/p16-1101, URL https://doi.org/10.18653/v1/p16-1101.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016) Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics). doi:10.18653/v1/n16-1030, URL https://doi.org/10.18653/v1/n16-1030.

Habibi, M., Weber, L., Neves, M., Wiegandt, D.L. and Leser, U. (2017) Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14): i37– i48. doi:10.1093/bioinformatics/btx228, URL https://doi.org/10.1093/bioinformatics/btx228.

Dang, T.H., Le, H.Q., Nguyen, T.M. and Vu, S.T. (2018) D3ner: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics 34(20): 3539–3546. doi:10.1093/bioinformatics/bty356, URL https://doi.org/10.1093/bioinformatics/bty356.

Giorgi, J.M. and Bader, G.D. (2018) Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics 34(23): 4087–4094. doi:10.1093/bioinformatics/bty449, URL https://doi.org/10.1093/bioinformatics/bty449.

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H. and Kang, J. (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4): 1234–1240. doi:10.1093/bioinformatics/btz682, URL https://doi.org/10.1093/bioinformatics/btz682.

Eftimov, T., Seljak, B.K. and Korošec, P. (2017) A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLOS ONE 12(6): e0179488. doi:10.1371/journal.pone.0179488, URL https://doi.org/10.1371/journal.pone.0179488.

Popovski, G., Kochev, S., Seljak, B. and Eftimov, T. (2019) FoodIE: A rule-based named-entity recognition method for food information extraction. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods

(SCITEPRESS - Science and Technology Publications). doi:10.5220/0007686309150922, URL https://doi.org/10.5220/0007686309150922.

Jafari, O., Nagarkar, P., Thatte, B. and Ingram, C. (2020) SatelliteNER: An effective named entity recognition model for the satellite domain. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (SCITEPRESS - Science and Technology Publications). doi:10.5220/0010147401000107, URL https://doi.org/10.5220/0010147401000107.

Biswas, P., Sharan, A. and Kumar, A. (2015) Agner: Entity tagger in agriculture domain. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom) (IEEE): 1134–1138.

Kumar, A. and Starly, B. (2021) “FabNER”: information extraction from manufacturing process science domain literature using named entity recognition. Journal of Intelligent Manufacturing 33(8): 2393–2407. doi:10.1007/s10845-021-01807-x, URL https://doi.org/10.1007/s10845-021-01807-x.

Leitner, E., Rehm, G. and Moreno-Schneider, J. (2019) Fine-grained named entity recognition in legal documents. In Lecture Notes in Computer Science (Springer International Publishing), 272– 287. doi:10.1007/978-3-030-33220-4_20, URL https://doi.org/10.1007/978-3-030-33220-4_20.

Guo, Z., Yu, L., Chen, G., Zhang, X., Wei, H. and Tang, Y. (2020) Entity recognition based on knowledge graph in air defense domain.

Journal of Physics: Conference Series 1693: 012168. doi:10.1088/1742-6596/1693/1/012168, URL https://doi.org/10.1088/17426596/1693/1/012168.

Bao, Y., An, Y., Cheng, Z., Jiao, R., Zhu, C., Leng, F., Wang, S. et al. (2020) Named entity recognition in aircraft design field based on deep learning. In Web Information Systems and Applications (Springer International Publishing), 333–340. doi:10.1007/978-

-030-60029-7_31, URL https://doi.org/10.1007/978-3-030-60029-7_31.

Xing, Z., Dai, Z., Luo, Q., Liu, Y., Chen, Z. and Wen, T. (2020) Research on name entity recognition method in civil aviation text. In 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT (IEEE). doi:10.1109/iccasit50869.2020.9368691, URL https://doi.org/10.1109/iccasit50869.2020.9368691.

Zhao, Y., Liu, H. and Chen, Z. (2021) Named entity recognition for chinese aviation security incident based on BiLSTM and CRF. In 2021 2nd Asia Conference on Computers and Communications (ACCC) (IEEE). doi:10.1109/accc54619.2021.00021, URL https://doi.org/10.1109/accc54619.2021.00021.

He, N., Ye, W. and Zhu, P. (2021) An approach to natural language intention understanding of civil aviation passengers based on DIET architecture. In The 5th International Conference on Computer Science and Application Engineering

(ACM). doi:10.1145/3487075.3487101, URL https://doi.org/10.1145/3487075.3487101.

Blatt, A., Kocour, M., Vesely, K., Szoke, I. and Klakow, D. (2022) Call-sign recognition and understanding for noisy air-traffic transcripts using surveillance information. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) : 8357–8361.

Honnibal, M. and Montani, I. (2017–2021), spaCy: Industrial-strength natural language processing in Python, https://spacy.io.10