Spotlight Series Recap: Generative AI and Open

The Higher Education Leadership Initiative for Open Scholarship (HELIOS Open) hosted a Spotlight Series focused on generative artificial intelligence (AI) and open scholarship.

The World Economic Forum defines Generative AI as “a category of artificial intelligence algorithms that generate new outputs based on the data on they have been trained. Unlike traditional AI systems that are designed to recognize patterns and make predictions, generative AI creates new content in the form of images, text, audio, and more.” In 2023, generative AI has seemingly made the leap from a topic vaguely on our collective radars to everything everywhere all at once. ChatGPT has emerged as a tool for the masses, with questions raised about how peer reviewers, editors, and professors will be able to discern between human and machine generated papers. Geoffrey Hinton, the so-called godfather of AI, quit Google with a stark warning about the ethical implications of AI. A raft of headlines like “Why We're Worried about Generative AI” appeared in prominent publications like Scientific American. Intrepid analysis revealed that the training datasets for a number of AI models are surprisingly small, pulling from heterogeneous sources.

Higher education is becoming increasingly aware that these concepts are of direct relevance to their missions and their research, including their open scholarship priorities. Three experts in this emerging space shared their insights.

First, Dr. Katie Shilton, Associate Professor in the College of Information Studies at the University of Maryland, College Park, and co-PI of the new NIST-NSF Institute for Trustworthy AI in Law & Society (TRAILS), explored ethical considerations of AI. For a long time, data has been a primary input of scholarship. Open scholarship has focused on making that data findable, accessible, interoperable, and reusable (FAIR) for other scholars. For generative AI, data is an input, but so are other research products like journal articles. The challenge is that Generative AI itself is not scholarship, but a replication of the language of scholarship. Data and papers are no longer only used to make predictions or new discoveries, but to feed large language models (LLM).

This establishes unprecedented ethical questions:

  • What are the right non-scientific generative uses of open scholarship?

  • What values or virtues do we want publicly-funded scholarship to support? How do we measure and track whether that is happening?

  • What roles do scholars play in managing the use of their scholarship?

Dr. Shilton posited that scholars may want to adopt open licenses that restrict their scholarship from some generative uses. There may be a need to develop licenses that require AI dashboards or other explanatory features, and FAIR data documentation practices are increasingly essential to support AI transparency.

Open scholarship is on the vanguard of these issues because open scholarship advocates are already thinking about the ethics of data sharing and can inform emerging norms. The principles this community develops, could guide content creators and generators as scholars and institutions navigate their relationships with generative AI tools and companies.

The NIST-NSF TRAILS Institute will support these efforts.

Dr. Susan Aaronson is also co-PI of the TRAILS program,  Research Professor of international affairs and Director of the Digital Trade and Data Governance Hub at George Washington University. Dr. Aaronson posed key questions at the start of her talk:

  • How did the firms or organizations creating LLM get their data to train their programs? 

  • Did they follow internationally accepted rules regarding copyright and personal data protection?

  • Is the data accurate, complete, and representative of the world, its people and their cultures?

  • Have the firms considered interoperability, transparency, data sovereignty, and values?

  • Are there considerations for policymakers and activists related to the expropriation of resources (data) and paying additional rents to big tech companies located in the West and China?

Dr. Aaronson explored the governance and regulation concerns for policymakers, scholars, and the public. The markets for AI are growing rapidly, but China and the U.S. hold 94% of all AI funding, with 73% of generative AI firms based in the U.S. According to Aaronson, these firms often use open-source methods, but often rely on trade secrets to protect their algorithms and to control and reuse the data they analyze. Policymakers, researchers, and international bodies are starting to explore these challenges and opportunities.

Dr. Aaronson concluded with an exploration of how US and China’s competitiveness could threaten open practices like data sharing. There is a belief that openness empowers China, and that is worrying to some U.S. policymakers and companies; however, openness is a norm of science and can yield comparative advantage. Science cannot progress without data sharing and cooperation.

Dr. Molly Kleinman, Managing Director of the Science, Technology, and Public Policy program at the University of Michigan, shared thoughts on AI's potential implications for scientific research. She began by highlighting her main takeaway: generative AI is trained on the past, and can only reproduce the past. Because generative AI can only “know” what it has been trained on, it cannot make anything truly novel, which has implications for its use in the conduct of and evaluation of scientific research. It will be increasingly important to be able to evaluate what is generated from AI. Scientists and scientists in training should be educated early on what AI can and cannot do.

In scholarly communication, there are questions about AI and trust in science, trust on campuses between faculty and students, how AI will be used in peer review, along with concerns about rightful authorship, research evaluation in tenure and promotion, and intellectual property. One thing that generative AI is good at is boiling down large quantities of information into brief summaries, but it is not so good at citing its sources. Dr. Kleinman noted that these tools will privilege highly cited articles that may not represent the field’s diversity or most novel findings.

Using and feeding generative AI also risks reinforcing Western, especially Anglo-American, dominance in science. The more common generative AI becomes as a tool in science, the more it will continue to reinforce the English language as the language of science. It is reasonable to assume that already-marginalized scientists may worry their scholarship will be less visible in a world with global, popularized use of AI tools. The aforementioned issues may challenge open science and scholarship because of valid fears of exclusion and extraction.

Colleges and universities have a role to play in ensuring equity and inclusion are considered when AI is incorporated in scholarly processes like researcher evaluation and peer review.

Similarly, scholarly publishers can monetize AI tools for researchers based on publishers’ proprietary content, and sell that information back to institutions, or hold these tools hostage unless they subscribe to their journals. Given the current issue of monopolies in scientific publishing and a small number of for-profit publishers’ control over a large swath of scholarly content, this has financial and ethical implications for higher education.

We may see pressure on academic institutions to adopt and use generative AI to stay up to date with the trends, so Dr. Kleinman concluded by reemphasizing the importance of training scientists to think critically about, and recognize the limitations, of generative AI.

Previous
Previous

Updated Name for HELIOS Open

Next
Next

HELIOS Open Members Co-Author Research Software Policy Recommendations to Federal Agencies