Fortified MapReduce Layer: Elevating Security and Privacy in Big Data
DOI:
https://doi.org/10.4108/eetsis.3859Keywords:
Big Data, Privacy, Security, HDFS, Map ReduceAbstract
In today's digital landscape, the widespread sharing and utilization of raw data are integral in social, medical, agricultural, and academic domains. The surge of open platforms has led to exponential growth in data, transforming it into what we now call Big Data (BD). However, the traditional BD model lacks a specific mechanism for capturing the sensitivity of data, leaving it vulnerable to potential breaches. To address this, a privacy and security layer is crucial. This paper propose a novel solution called the Fortified Secured Map Reduce (FSMR) Layer, which serves as an intermediary between the HDFS (Hadoop Distributed File System) and MR (Map Reduce) Layer. The FSMR model is designed to foster data sharing for knowledge mining while ensuring robust privacy and security guarantees. It effectively resolves scalability issues concerning privacy and strikes a balance between privacy and utility for data miners. By implementing the FSMR model, we achieve remarkable improvements in running time and information loss compared to existing approaches. Furthermore, storage and CPU utilization are minimized, enhancing the overall efficiency and effectiveness of the data processing pipeline. The outcome of our work lies in promoting data sharing while safeguarding sensitive information, making it a significant step towards secure and privacy-conscious BD processing.
References
[1] P. Jain, M. Gyanchandani, and N. Khare, "Big data privacy: a technological perspective and review," J. Big Data, vol. 3, p. 25, 2016, ISSN 2196-1115.
A. Mehmood, I. Natgunanathan, Y. Xiang, G. Hua, and S. Guo, "Protection of Big Data Privacy," IEEE Access, vol. 4, pp. 1821-1834, 2016, https://doi.org/10.1109/access.2016.2558446.
S. Sagiroglu and D. Sinanc, "Big Data: a review," J. Big Data, vol. 1, pp. 20-24, 2013.
V. Chavan and R. N. Phursule, "Survey paper on big data," Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 6, pp. 7932-7939, 2014.
P. Groves, B. Kayyali, D. Knott, and S. V. Kuiken, "The big data revolution in healthcare," New York: McKinsey & Company, 2013.
J. Lin, "MapReduce is good enough? The control project," IEEE Comput., 2013, vol. 32.
A. B. Patel, M. Birla, and U. Nair, "Addressing Big Data Problem Using Hadoop and Map Reduce," in Nirma University International Conference On Engineering in Proc., 2012.
V. Cevher, S. Becker, and M. Schmidt, "Convex optimization for Big Data: scalable, randomized, and parallel algorithms for Big Data analytics," IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 32-43, 2014.
M.-H. Kuo, T. Sahama, A. W. Kushniruk, E. M. Borycki, and D. K. Grunwell, "Health Big Data analytics: current perspectives, challenges, and potential solutions," Int. J. Big Data Intell., vol. 1, no. 1/2, pp. 114-126, 2014.
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, "Privacy-preserving data publishing: a survey of recent developments," ACM Comput. Surveys, vol. 42, no. 4, 2010.
A. Machanavajjhala, J. Gehrke, and D. Kifer, "L-diversity: privacy beyond k-anonymity," in Proc. 22nd International Conference on Data Engineering (ICDE'06), Atlanta, GA, USA, 2006, pp. 24.
R. Nix, M. Kantarcioglu, and K. J. Han, "Approximate privacy-preserving data mining on vertically partitioned data," in Data and Applications Security and Privacy XXVI, Springer, 2012, pp. 129-144.
P. Jain, N. Pathak, P. Tapashetti, and A. S. Umesh, "Privacy-preserving processing of data decision tree based on sample selection and Singular Value Decomposition," in 9th International Conference on Information Assurance and Security (IAS), Gammarth, 2013, pp. 91-95.
P. Jain, M. Gyanchandani, and N. Khare, "Privacy and security concerns in healthcare big data: an innovative prescriptive," J. Inform Assur Secur., vol. 12, no. 1, pp. 18-30, 2017.
C. Yin, S. Zhang, J. Xi, and J. Wang, "An improved anonymity model for Big Data security based on clustering algorithm," Combined Special Issues on Security and privacy in social networks (NSS2015) and 18th IEEE International Conference on Computational Science and Engineering (CSE2015), vol. 29, no. 7-10, 2017.
Big Data Top challenge 2016. [Online]. Available: https://downloads.cloudsecurityalliance.org/initiatives/bdwg/BigDataTopTenv1.pdf. Accessed 15 Jan 2018.
Big Data Submits Online. [Online]. Available: https://theinnovationenterprise.com/summits/big-data-innovation-mumbai/eventactivities=5546. Accessed 17 Feb 2018.
The intersection of privacy and security data privacy day event 2012. [Online]. Available: https://concurringopinions.com/archives/2012/01/the-intersection-of-privacy-and-security-data-privacy-day-event-at-gw-law-school.html. Accessed 16 Feb 2018.
O. Savas and J. Deng, "Big data analytics in cybersecurity," CRC Press, Taylor Francis Group, 2017.
P. Jain, M. Gyanchandani, and N. Khare, "Data Privacy for Big Data Publishing Using Newly Enhanced PASS Data Mining Mechanism," Data mining book chapter, Intech open Publisher, 2018, DOI: http://dx.doi.org/10.5772/intechopen.77033.
E. Mohammadian, M. Noferesti, and R. Jalili, "FAST: Fast Anonymization of Big Data Streams," in Proc. of the 2014 International Conference on Big Data Science and Computing, 2014, p. 23.
S. Evfmievski, "Randomization techniques for privacy preserving association rule mining," in SIGKDD Explorations, 2002, vol. 4, no. 2.
K. Tripathy, A. Mitra, "An Algorithm to achieve k-anonymity and l-diversity anonymization in Social Networks," in Proc. of Fourth International Conference on Computational Aspects of Social Networks (CA-SoN), Sao Carlos, 2012.
P. Jain, M. Gyanchandani, and N. Khare, "Improved k-Anonymity Privacy-Preserving Algorithm Using Madhya Pradesh State Election Commission Big Data," Integrated Intelligent Computing, Communication, and Security, Studies in Computational Intelligence, vol. 771, pp. 1-10, 2019.
M. A. Kadampur, "A data perturbation method by field rotation and binning by averages strategy for privacy preservation," in Proc. of the 2008 7th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2008, pp. 1458-1461, https://doi.org/10.1109/iciea.2012.6360953.
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, "Mondrian multidimensional k-anonymity," in Proc. 22nd Int. Conf. Data Engineering, Ser. ICDE'06, Washington, DC, USA, April 2006, pp. 1-11.
H. Zakerzadeh, C. C. Aggarwal, and K. J. Barker, "Privacy-preserving big data publishing," in Proc. 27th Int. Conf. Scientific and Statistical Database Management, Ser. SSDBM '15, New York, ACM, 2015, pp. 26:1-26:11.
I. Roy, H. E. Ramadan, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel, "Airavat: Security and privacy for MapReduce," in Proc. of the 7th Usenix Symp. on Networked Systems Design and Implementation, San Jose, 2010.
P. Derbeko et al., "Security and privacy aspects in MapReduce on clouds: a survey," Comput Sci Rev., vol. 20, pp. 1, 2016.
K. Pathak, N. S. Chaudhari, and A. Tiwari, "Privacy preserving association rule mining by introducing the concept of the impact factor," in Proc. of the 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA), Singapore, 2012, pp. 1458-1461, https://doi.org/10.1109/iciea.2012.6360953.
G. S. Yadav and A. Ojha, "Multimed Tools Appl.," vol. 77, pp. 16319, 2018, https://doi.org/10.1007/s11042-017-5200-1.
R. Terzi, R. Terzi, and S. Sagiroglu, "A survey on security and privacy issues in Big Data," in Proc. of ICITST 2015, London, UK, December 2015.
L. Kacha and A. Zitouni, "An Overview on Data Security in Cloud Computing," in CoMeSySo: cybernetics approaches in intelligent systems, Springer, 2017, pp. 250-261.
K. Ilavarasi and B. Sathiyabhama, "An evolutionary feature set decomposition based anonymization for classification workloads: privacy preserving data mining," Journal of cluster computing, New York, Springer, 2017.
G. Acampora et al., "Data analytics for pervasive health," in Healthcare data analytics, ISSN: 533-576, 2015.
A. P. Kulkarni and M. Khandewal, "Survey on Hadoop and introduction to YARN," Int J Emerg Technol Adv Eng., vol. 4, no. 5, pp. 82-87, 2014.
E. Yu and S. Deng, "Understanding software ecosystems: a strategic modeling approach," in Proceedings of the Workshop on Software Ecosystems 2011, IWSECO-2011, pp. 6-16.
K. Shim, "MapReduce Algorithms for Big Data Analysis," DNIS, LNCS, 2013, pp. 44-48.
S. Arora and D. M. Goel, "Survey Paper on scheduling in Hadoop," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 4, no. 5, 2014.
P. Jain, M. Gyanchandani, and N. Khare, "Big Data Security and Privacy: New Proposed Model of Big Data with Secured MR Layer," in Advanced Computing and Systems for Security, Advances in Intelligent Systems and Computing, vol. 883, Springer, Singapore, 2019.
L. Sweeney, "K-anonymity: a model for protecting privacy," Int J Uncertain Fuzz., vol. 10, no. 5, pp. 557-570, 2002.
C. C. Zakerdah and K. B. Aggarwal, "Privacy-preserving Big Data publishing," La Jolla: ACM, 2015.
T. Morey, T. Forbath, and A. Schoop, "Customer data: designing for transparency and trust," Harvard Business Rev., vol. 93, no. 5, pp. 96-105, 2015.
A. Friedman, R. Wolf, and A. Schuster, "Providing k-anonymity in data mining," VLDB J., vol. 17, no. 4, pp. 789-804, 2008.
B. Fung et al., "Privacy-preserving data publishing: a survey of recent developments," ACM Comput Surveys (CSUR), vol. 42, no. 4, 2010.
S. Y. Ko, K. Jeon, and R. Morales, "The HybrEx model for confidentiality and privacy in cloud computing," in 3rd USENIX workshop on hot topics in cloud computing, HotCloud’11, Portland, 2011.
Apache Hive. [Online]. Available: http://hive.apache.org. Accessed 18 Mar 2018.
Apache HDFS. [Online]. Available: http://hadoop.apache.org/hdfs. Accessed 17 Mar 2018.
Tweepy dataset online. [Online]. Available: https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/. Accessed 18 March 2018.
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "Fast data anonymization with low information loss," in Proc. Int'l Conf. very large data bases (VLDB), 2007, pp. 758-769.
Algaradi, T. S., B. Rama. Static Knowledge-Based Authentication Mechanism for Hadoop Distributed Platform Using Kerberos. – Int. J. Adv. Sci. Eng. Inf. Technol., Vol. 9, 2019, No 3, pp. 772-780.
Tsu-Yang Wu, Xinglan Guo, Lei Yang, Qian Meng, Chien-Ming Chen, "A Lightweight Authenticated Key Agreement Protocol Using Fog Nodes in Social Internet of Vehicles", Mobile Information Systems, vol. 2021, Article ID 3277113, 14 pages, 2021. https://doi.org/10.1155/2021/3277113
Hena, M., Jeyanthi, N. Distributed authentication framework for Hadoop based bigdata environment. J Ambient Intell Human Comput 13, 4397–4414 (2022). https://doi.org/10.1007/s12652-021-03522-0
Honar Pajooh, H., Rashid, M.A., Alam, F. et al. IoT Big Data provenance scheme using blockchain on Hadoop ecosystem. J Big Data 8, 114 (2021). https://doi.org/10.1186/s40537-021-00505
Marco Anisetti, Claudio A. Ardagna, Filippo Berto, An assurance process for Big Data trust worthiness ,Future Generation Computer Systems,Volume 146,2023,Pages 34-46,ISSN 0167-739X,
Tall, A.M.; Zou, C.C. A Framework for Attribute-Based Access Control in Processing Big Data with Multiple Sensitivities. Appl. Sci. 2023, 13, 1183. https://doi.org/10.3390/app13021183
X. Sun, H. Wang, and J. Li, "Injecting purpose and trust into data anonymisation," in Proceedings of the 18th ACM conference on Information and Knowledge Management (CIKM '09), New York, NY, USA, 2009, pp. 1541–1544, doi: 10.1145/1645953.1646166.
Y. -F. Ge et al., "Evolutionary Dynamic Database Partitioning Optimization for Privacy and Utility," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2023.3302284.
Y.-F. Ge, E. Bertino, H. Wang, J. Cao, and Y. Zhang, "Distributed Cooperative Coevolution of Data Publishing Privacy and Transparency," ACM Trans. Knowl. Discov. Data, vol. 18, no. 1, Article 20, pp. 23 pages, Jan. 2024, doi: 10.1145/3613962.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Manish Gupta, Rajendra Kumar Dwivedi
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.