Real-Time Monitoring of Data Pipelines: Exploring and Experimentally Proving that the Continuous Monitoring in Data Pipelines Reduces Cost and Elevates Quality
DOI:
https://doi.org/10.4108/eetsis.5065Keywords:
Data pipelines, monitoring, real-time, data observability, data quality, anomaly detectionAbstract
Data pipelines are crucial for processing and transforming data in various domains, including finance, healthcare, and e-commerce. Ensuring the reliability and accuracy of data pipelines is of utmost importance to maintain data integrity and make informed business decisions. In this paper, we explore the significance of continuous monitoring in data pipelines and its contribution to data observability. This work discusses the challenges associated with monitoring data pipelines in real-time, propose a framework for real-time monitoring, and highlight its benefits in enhancing data observability. The findings of this work emphasize the need for organizations to adopt continuous monitoring practices to ensure data quality, detect anomalies, and improve overall system performance.
References
Dwyer, M, Hwang, J, Shires, A, Cohen J. Application of Comprehensive Data Analysis for Interactive, Hierarchical Views of HPC Workloads. IEEE International Conference on Big Data. 2018:3585-3589.
Lachner, C, Laufer, J, Dustdar, S, Pohl, K. A Data Protection Focused Adaptation Engine for Distributed Video Analytics Pipelines. IEEE Access. 2022:10: 68669-68685.
Hu, H, Wen, Y, Chua T. –S, Li, X. Toward Scalable Systems for Big Data Analytics. A Technology Tutorial. IEEE Access. 2014: 2: 652-687.
Kulkarni, A. R, Kumar, N, Rao K. R. Efficacy of Bluetooth-Based Data Collection for Road Traffic Analysis and Visualization Using Big Data Analytics. Big Data Mining and Analytics. 2023: 6:139-153.
Icilia, MÁ, García – Barriocanal, E, Sánchez – Alonso, S, Mora – Cantallops, M, Cuadrado, JJ. Ontologies for Data Science On Its Application to Data Pipelines. Metadata and Semantic Research. Communications in Computer and Information Science. 2018; 846: 1-8
Franklin, M. J, Halevy, A, Maier D. From databases to dataspaces: A new abstraction for information management. ACM SIGMOD Record. 2005; 34: 27-33
Quinlan, J, R: Induction of decision trees, Machine Learning. 1986; 1: 81-106
Oleghe, O, Salonitis, K.: A framework for designing data pipelines for manufacturing systems. Procedia CIRP. 2020; 93: 724-729
Biswas, S, Wardat, M, Rajan, H.: The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large. In Proceedings of the 44th International Conference on Software Engineering. 2022: 2091-2103
Karumuri, S, Solleza, F, Zdonik, S, Tatbul, N S, Solleza, F, S, Tatbul.: Towards observability data management at scale. ACM SIGMOD. 2021; 49: 18-23
Agostinelli, S, Benvenuti, D, De Luzi, F, Marrella A.: Big Data Pipeline Discovery through Process Mining Challenges and Research Directions. ITBPM@ BPM. 2021: 50-55
D. Roman.: Big Data Pipelines on the Computing Continuum: Tapping the Dark Data, in Computer. 2022; 55: 74-84
Benvenuti,D, Falleroni, L, Marrella, A, Perales, F.: An Interactive Approach to Support Event Log Generation for Data Pipeline Discovery. IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Los Alamitos, CA, USA. 2022: 1172-1177
Quemy, A.: Data Pipeline Selection and Optimization. In DOLAP. 2019; 1-12
Goodhope, K, Koshy, J, Kreps, J, Narkhede, N, Park, R, Rao, J, Ye, V. Y.: Building LinkedIn's Real-time Activity Data Pipeline. IEEE Data Eng. Bull. 2012; 35: 33-45
Eve, M, P.: A data pipeline with Apache Airflow and Dask. 2023; 1-6
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Shammy Narayanan, Maheswari S, Prisha Zephan
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.