ALGORITHMIC LITERACY: Generative Artificial Intelligence Technologies for Data Librarians
DOI:
https://doi.org/10.4108/eetsis.4067Keywords:
Generative Pre-trained Transformer, Algorithmic Literacy, Python, Open AI, Data LibrarianAbstract
INTRODUCTION: Artificial intelligence (AI) is a novel type of library technology. AI technologies and the needs of data librarians are hybrid and symbiotic, because academic libraries must insert AI technologies into their information and data services. Library services need AI to interpret the context of big data.
OBJECTIVES: In this context, we explore the use of the the OpenAI Codex, a deep learning model trained on Python code from repositories, to generate code scripts for data librarians. This investigation examines the practices, models, and methodologies for obtaining code script insights from complex code environments linked to AI GPT technologies. METHODS: The proposed AI-powered method aims to assist data librarians in creating code scripts using Python libraries and plugins such as the integrated development environment PyCharm, with additional support from the Machinet AI and Bito AI plugins. The process involves collaboration between the data librarian and the AI agent, with the librarian providing a natural language description of the programming problem and the OpenAI Codex generating the solution code in Python.
RESULTS: Five specific web-scraping problems are presented. The scripts demonstrate how to extract data, calculate metrics, and write the results to files.
CONCLUSION: Overall, this study highlights the application of AI in assisting data librarians with code script creation for web scraping tasks. AI may be a valuable resource for data librarians dealing with big data challenges on the Web. The possibility of creating Python code with AI is of great value, as AI technologies can help data librarians work with various types of data sources. The Python code in Data Science web scraping projects uses a machine-learning model that can generate human-like code to help create and improve the library service for extracting data from a web collection. The ability of nonprogramming data librarians to use AI technologies facilitates their interactions with all types and data sources. The Python programming language has artificial intelligence modules, packages, and plugins such as the OpenAI Codex, which serialises automation and navigation in web browsers to simulate human behaviour on pages by entering passwords, selecting captcha options, collecting data, and creating different collections of datasets to be viewed.
References
OpenAI 2023. Retrieved from https://openai.com/
Perplexity. 2023. Retrieved from https://www.perplexity.ai/.
ChatGPT (2023). Retrieved from https://chat.openai.com/.
Pavlik, J. V. Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education. Journalism and Mass Communication Educator, 2023; 78(1), 84–93. doi:10.1177/10776958221149577.
Boden, M. Artificial intelligence: A very short introduction, London: Oxford, 2016.
Lund, B. D., & Wang, T. Chatting about ChatGPT: How may AI and GPT impact academia and libraries?. Library Hi Tech News. 2023; 40(3), 26–29. doi:10.1108/LHTN-01-2023-0009.
Radford, A. Narasimhan, K., Salimans, T., Sutskever, L. Improving language understanding by generative pretraining. 2018.
Machinet AI. 2023. Retrieved from https://www.machinet.net/.
Aixcoder. 2013. Retrieved from https://www.aixcoder.com/en/#/.
ChatGPT Mentor 2023. https://plugins.jetbrains.com/plugin/21316-gpt-mentor
Bito AI. 2023. Retrieved from https://bito.ai/.
PyCharm. 2023. Retrieved from https://www.jetbrains.com/pt-br/pycharm/.
Zenodo. 2023. Retrieved from https://zenodo.org.
Python. 2023. Retrieved from https://www.python.org/.
ORCID. 2023. Retrieved from https://orcid.org/.
Google Scholar. 2023. Retrieved from https://scholar.google.com.
ScopusID. 2023. Retrieved from https://scopus.com/. ID.
ResearcherID. 2023. Retrieved from https://www.webofscience.com/wos/.
Gold, A.. Cyberinfrastructure, data, and libraries, part 1: A cyberinfrastructure primer for librarians. D-Lib Magazine. 2007; 13(9/10). Retrieved from http://www.dlib.org/dlib/september07/gold/09gold-pt1.html doi:10.1045/september2007-gold-pt1
Federer, L. Defining data librarianship: A survey of competencies, skills, and training. Journal of the Medical Library Association. 2018; 106(3), 294–303. doi:10.5195/jmla.2018.306.
Koltay, T. Data literacy for researchers and data librarians. Journal of Librarianship and Information Science. 2017; 49(1), 3–14. doi:10.1177/0961000615616450.
Koltay, T. Accepted and emerging roles of academic libraries in supporting research 2.0. Journal of Academic Librarianship. 2019; 45(2), 75–80. doi:10.1016/j.acalib.2019.01.001.
Perrier, L., Blondal, E., & MacDonald, H. The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis. PLOS ONE. 2020 15(2), e0229182. doi:10.1371/journal.pone.0229182.
Chartered Institute of Library and Information Professionals. (2018). CILIP definition of information literacy 2018. Retrieved from https://www.cilip.org.uk/resource/resmgr/cilip/information_professional_and_news/press_releases/2018_03_information_lit_definition/cilip_definition_doc_final_f.pdf.
Fotopoulou, A. Conceptualising critical data literacies for civil society organizations: Agency, care, and social responsibility. Information, Communication and Society. 2021; 24(11), 1640–1657. doi:10.1080/1369118X.2020.1716041.
Fontichiaro, K., & Johnston, M. P. Rapid shifts in educators’ perceptions of data literacy priorities. Journal of Media Literacy Education. 2020; 12(3), 75–87. doi:10.23860/JMLE-2020-12-3-7.
Lloyd, A., & Hicks, A. Contextualising risk: The unfolding information work and practices of people during the COVID-19 pandemic. Journal of Documentation. 2021; 77(5), 1052–1072. doi:10.1108/JD-11-2020-0203.
Gray, J. Jim Gray on eScience: A transformed scientific method. , 2009. In In: Hey, T.; Tansley, S.; Tolle, K. (Ed.). The fourth paradigm: data-intensive scientific discovery. Washington: Microsoft Research, 2009.
Haider, J., & Sundin, O. Paradoxes of Media and information literacy: The crisis of information. London: Taylor & Francis; 2022.
Carmi, E., Yates, S. J., Lockley, E., & Pawluczuk, A. Data citizenship: Rethinking data literacy in the age of disinformation, misinformation and malinformation. Internet Policy Review. 2020; 9(2), 1–22. doi:10.14763/2020.2.1481.
Donohoe, D., & Costello, E. Data visualisation literacy in higher education: An exploratory study of understanding of a learning dashboard tool. International Journal of Emerging Technologies in Learning. 2020; 15(17), 115–126. doi:10.3991/ijet.v15i17.15041.
Corrall, S. Repositioning data literacy as a mission-critical competence. 2019. Retrieved from http://d-scholarship.pitt.edu/id/eprint/36975.
Burton, Matt and Lyon, Liz and Erdmann, Chris and Tijerina, Bonnie. The future of data science in libraries. Project Report. Pittsburgh, PA: University of Pittsburgh; 2018. Retrieved from http://scholarship.pitt.edu/33891/.
Semeler, A. R., Pinto, A. L., & Rozados, H. B. F. Data science in data librarianship: Core competencies of a data librarian. Journal of Librarianship and Information Science. 2019; 51(3), 771–780. doi:10.1177/0961000617742465.
Stuart, D. Practical data science for information professionals. London: Facet Publishing; 2020.
Ridley, M., & Pawlick-Potts, D. Algorithmic literacy and the role for libraries. Information Technology and Libraries. 2021; 40(2). doi:10.6017/ITAL.V40I2.12963
Cox, A. M., & Mazumdar, S. Defining artificial intelligence for librarians. Journal of Librarianship and Information Science. 2020 0(0). doi:10.1177/09610006221142029.
Long, D., & Magerko, B. What is AI literacy? Competencies and Design considerations. In Conference on human factors in computing systems (CHI). 2020; doi:10.1145/3313831.3376727.
Cormen,T.H., Leiserson, C.E., Rivest, R.L., Stein, C. Introduction to algorithms. Cambridge MA: MIT Press; 2009.
Dalbey, J. Pseudocode standard. 2001. Retrieved from http://users.csc.calpoly.edu/~jdalbey/SWE/pdl_std.html.
Granville, J. Data science central. 2017. Retrieved from https://www.datasciencecentral.com/python-overtakes-r-for-data-science-and-machine-learning/.
Google trends. 2023. Retrieved from https://trends.google.com/trends.
Selenium. 2023. Retrieved from https://www.selenium.dev/.
LXML. XML and HTML with Python. 2023 Retrieved from http://lxml.de.
Glez-Peña, D., Lourenço, A., López-Fernández, H., Reboiro-Jato, M., & Fdez-Riverola, F. Web scraping technologies in an API world. Briefings in Bioinformatics. 2014; 15(5), 788–797. Retrieved from http://bib.oxfordjournals.org/content/15/5/788. doi:10.1093/bib/bbt026.
Carle, V. [KTH, Skolan för elektroteknik och datavetenskap (EECS)], Web scraping using machine learning. 2020. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281344 (Thesis).
Diouf, R., Sarr, E. N., Sall, O., Birregah, B., Bousso, M., & Mbaye, S. N. Web scraping: State-of-the-art and areas of application. In: IEEE International Conference on Big Data (Big Data); 2019. doi:10.1109/BigData47090.2019.9005594.
Webster, S. What is scraping? The basics for everyone. 2015. Retrieved from https://myhelpster.com/what-is-scraping-the-basics-for-everyone.
Rice computer science. 2023. Retrieved from https://csweb.rice.edu/academics/graduate-programs/online-mds/blog/programming-languages-for-data-science.
Brennan, R. W., & Lesage, J. Exploring the Implications of OpenAI codex on Education for Industry 4.0. 2023. doi:10.1007/978-3-031-24291-5_20.
Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. The robots are coming: Exploring the implications of OpenAI codex on introductory programming. Paper presented at the ACM International Conference Proceeding Series. 2022. doi:10.1145/3511861.3511863.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Alexandre Semeler, Adilson Luiz Pinto, Tibor Koltay, Thiago Dias , Arthur Longoni Oliveira , José Antonio Moreiro González, Helen Beatriz Frota Rozados
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.