Defining Open Source AI: Current Conversations within the Academic Community
Defining Open Source AI: Current Conversations within the Academic Community
The Open Source Initiative (OSI), a California public benefit corporation and not-for-profit community of technology experts, recently published the Open Source AI Definition – draft v. 0.0.9. The definition centers on the importance of AI systems that can be used, modified, and shared for any purpose, and studied and inspected transparently.
HELIOS Open consulted its Advisory Committee about the definition, hearing from committee and OSI board member, Sayeed Choudhury, at the September meeting. Choudhury shared information about the process OSI led to define open source AI (*see below for specific demographic and community participation information).
While the Mozilla Foundation recently endorsed OSI’s community-led definition, asserting it “is critical not just for redefining what “open source” means in the context of AI; it’s about shaping the future of the technology and its impact on society,” the definition’s release and nature of development is not without its critics: Meta, e.g., has developed its own open source AI definition, based on its Llama model. According to Choudhury, “it should be noted that Meta’s definition does not comply with the Open Source AI Definition. Fundamentally, the question relates to whether any single company or entity should be able to define something which will have such a profound impact on society.”
HELIOS Open, by design, does not endorse projects or initiatives, and individuals in the group were split on their support for both the initial definition and the process for developing it; however, one important consensus takeaway emerged:
We, as a community of higher education representatives, agree that one, single commercial entity should not define open source AI behind closed doors. In general, we should contribute to inclusive community-led efforts that center ethical AI values like openness/transparency and trust.
HELIOS Open Advisory Committee Recommendations and Considerations
Upon initial consideration of the definition, the group discussed recommendations, including improving the language around training data and its openness, which was developed with data privacy, copyright, and practicality considerations in mind. Stephen Jacobs, HELIOS Open Advisory Committee member and Director of Open@RIT, shared “if the definition doesn’t start by emphasizing the openness of training data out of the gate, I worry it will not get added in later. Over time negotiation tends to narrow, not broaden definitions.” OSI has shared that the definition and work is evolving, and conversation ensued about the differences between open source software and open source AI, including nuances with AI model code and training data and whether or not it is even possible to have full transparency into how an AI system is trained.
Some members believe that the OSI definition offers a standard to consider as the definition evolves. Future changes should be prompted by evaluating the definition against models, tracking changes in our understanding of the engineering aspects of these models, and (hopefully) assessing the merits of building and using publicly available data. Future conversations might also generate better understanding of issues related to reproducibility and replicability to better support our understanding of open source AI and scientific integrity.
HELIOS Open and Incentivizing Open Work
AI is an emerging priority area for higher education leaders. In discussions about the core components of an AI system, the HELIOS Open Advisory Committee reflected on the importance of AI systems that are also trustworthy, inclusive, and transparent. Tying directly to HELIOS Open’s goals to ensure campuses are incentivizing open scholarship practices in service of a more equitable and trustworthy research ecosystem, the group also validated the need to incentivize open work within academia like contributions to open source; individuals sharing research outputs openly; and individuals developing open curriculum or educational materials; among other activities.
HELIOS Open will be watching this space to better understand how open source AI might play a role in the academic setting.
Why Values and Ethics in AI Matter to Campus Leaders
As our research enterprise has become increasingly invested in AI, we emphasize the importance of AI systems that prioritize ethical AI considerations and values. Campus leaders are not only creating new positions for faculty, staff, and postdocs focused on AI, but also leveraging AI tools to maximize student outcomes and to promote research impact and efficiency. A definition can start a conversation and exploration, and one that is community-derived and values-aligned can help with future evaluation exercises.
Within academic institutions, libraries on campus are grappling with publisher AI restrictions that seek to limit institutions’ ability to use AI on publisher content. Many of the same companies are rapidly integrating AI into their own systems, often in ways that are not fully transparent and sometimes with limited notice. Particularly without transparency, vendors’ use of AI could pose new concerns, as we noted with regard to institutional data such as faculty activity reports in our analysis of the Barcelona Declaration. Campuses must be able to trust the systems that they rely on—regardless of whether they are commercial or non-commercial—and the implementation of AI in these systems should ensure that institutions’ interests are protected. Pursuing open source AI in research systems can be an important step in addressing these issues.
The federal government, which annually funds approximately $90 billion in academic research, is also invested in trustworthy AI systems. In May, the National Science Foundation's (NSF) pilot of the National AI Research Resource (NAIRR) announced its first round of awards to 35 projects and opened the application process for the second round of awards. The NAIRR pilot was established by President Biden's Executive Order on the Safe, Secure and Trustworthy Development and Use of AI and aims to democratize access to AI tools for all communities, specifically researchers at institutions of higher education, students, and small businesses.
HELIOS Open values community efforts that support openness and transparency for public benefit. Open source has long been an important component of digital infrastructure. Of all software, recent estimates suggest that 96% contain open source software. A Harvard Business School study noted that open source software has generated $8.8 trillion of value on the demand side, and reduced production costs on the supply side by a factor of 3.5. A community-based, consensus open source AI definition could yield even more benefits for universities and society at large, so we will eagerly watch the community process unfold and continue urging higher education leaders to incentivize open work on campus.
—-
*A note from OSI board member on OSI’s process for defining open source AI: The OSAID process serves as a reference model for diverse AI-related community engagement. OSI facilitated a dozen town halls and working group meetings; established a public forum for discussion and debate; provided presentations and opportunities for feedback at three dozen public events on five continents; 36 co-design working group/systems review volunteers representing 23 countries by birth or residence.
Of the working group members more than half (53%) are People of Color, more than one-third are Black, and slightly less (31%) are both Black and Indigenous. Women and femmes, including transgender women, account for 28% of the total and 63% of those individuals are Women of Color. Of this same volunteer group, nearly 30% of active co-designers are from academia, either as faculty or grad students.