Real-Time Monitoring of Data Pipelines: Exploring and Experimentally Proving that the Continuous Monitoring in Data Pipelines Reduces Cost and Elevates Quality

Authors

  • Shammy Narayanan Thryve Digital LLP
  • Maheswari S Vellore Institute of Technology University image/svg+xml
  • Prisha Zephan Sathyabama Institute of Science and Technology image/svg+xml

DOI:

https://doi.org/10.4108/eetsis.5065

Keywords:

Data pipelines, monitoring, real-time, data observability, data quality, anomaly detection

Abstract

Data pipelines are crucial for processing and transforming data in various domains, including finance, healthcare, and e-commerce. Ensuring the reliability and accuracy of data pipelines is of utmost importance to maintain data integrity and make informed business decisions. In this paper, we explore the significance of continuous monitoring in data pipelines and its contribution to data observability. This work discusses the challenges associated with monitoring data pipelines in real-time, propose a framework for real-time monitoring, and highlight its benefits in enhancing data observability. The findings of this work emphasize the need for organizations to adopt continuous monitoring practices to ensure data quality, detect anomalies, and improve overall system performance.

References

Dwyer, M, Hwang, J, Shires, A, Cohen J. Application of Comprehensive Data Analysis for Interactive, Hierarchical Views of HPC Workloads. IEEE International Conference on Big Data. 2018:3585-3589.

Lachner, C, Laufer, J, Dustdar, S, Pohl, K. A Data Protection Focused Adaptation Engine for Distributed Video Analytics Pipelines. IEEE Access. 2022:10: 68669-68685.

Hu, H, Wen, Y, Chua T. –S, Li, X. Toward Scalable Systems for Big Data Analytics. A Technology Tutorial. IEEE Access. 2014: 2: 652-687.

Kulkarni, A. R, Kumar, N, Rao K. R. Efficacy of Bluetooth-Based Data Collection for Road Traffic Analysis and Visualization Using Big Data Analytics. Big Data Mining and Analytics. 2023: 6:139-153.

Icilia, MÁ, García – Barriocanal, E, Sánchez – Alonso, S, Mora – Cantallops, M, Cuadrado, JJ. Ontologies for Data Science On Its Application to Data Pipelines. Metadata and Semantic Research. Communications in Computer and Information Science. 2018; 846: 1-8

Franklin, M. J, Halevy, A, Maier D. From databases to dataspaces: A new abstraction for information management. ACM SIGMOD Record. 2005; 34: 27-33

Quinlan, J, R: Induction of decision trees, Machine Learning. 1986; 1: 81-106

Oleghe, O, Salonitis, K.: A framework for designing data pipelines for manufacturing systems. Procedia CIRP. 2020; 93: 724-729

Biswas, S, Wardat, M, Rajan, H.: The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large. In Proceedings of the 44th International Conference on Software Engineering. 2022: 2091-2103

Karumuri, S, Solleza, F, Zdonik, S, Tatbul, N S, Solleza, F, S, Tatbul.: Towards observability data management at scale. ACM SIGMOD. 2021; 49: 18-23

Agostinelli, S, Benvenuti, D, De Luzi, F, Marrella A.: Big Data Pipeline Discovery through Process Mining Challenges and Research Directions. ITBPM@ BPM. 2021: 50-55

D. Roman.: Big Data Pipelines on the Computing Continuum: Tapping the Dark Data, in Computer. 2022; 55: 74-84

Benvenuti,D, Falleroni, L, Marrella, A, Perales, F.: An Interactive Approach to Support Event Log Generation for Data Pipeline Discovery. IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Los Alamitos, CA, USA. 2022: 1172-1177

Quemy, A.: Data Pipeline Selection and Optimization. In DOLAP. 2019; 1-12

Goodhope, K, Koshy, J, Kreps, J, Narkhede, N, Park, R, Rao, J, Ye, V. Y.: Building LinkedIn's Real-time Activity Data Pipeline. IEEE Data Eng. Bull. 2012; 35: 33-45

Eve, M, P.: A data pipeline with Apache Airflow and Dask. 2023; 1-6

Downloads

Published

07-02-2024

How to Cite

1.
Narayanan S, S M, Zephan P. Real-Time Monitoring of Data Pipelines: Exploring and Experimentally Proving that the Continuous Monitoring in Data Pipelines Reduces Cost and Elevates Quality. EAI Endorsed Scal Inf Syst [Internet]. 2024 Feb. 7 [cited 2024 May 19];11(4). Available from: https://publications.eai.eu/index.php/sis/article/view/5065