A Cartography of Collaboration in Open Source AI: Mapping Collaboration in the Development and Reuse Lifecycle of 12 Open Large Language Models OFA Symposium 2025: Open Technology Impact in Uncertain Times

A Cartography of Collaboration in Open Source AI: Mapping Collaboration in the Development and Reuse Lifecycle of 12 Open Large Language Models
.ical
18/11/2025 16:40–17:10, Main Room

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partner- ships, among others. Second, open LLM developers have a variety of social, economic, and techno- logical motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strate- gies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.I

As generative artificial intelligence (AI) models become increasingly prevalent and released under various forms of open and permissive licenses, there is a critical need to understand how they are built and who contributes to this process. Currently, there is limited research that maps open collaboration practices across different stages of the development or reuse of models and their constituent artifacts (e.g. training datasets, software, model weights, evaluation benchmarks). This research aims to map and characterize open collaboration (specifically, the “collaboration on-ramps”) at different stages in the development and reuse lifecycle of open generative AI models, with a focus on open large language models (LLMs). Through qualitative interviews with 12 open LLM developers (i.e. Allen Institute for AI, EleutherAI, Cohere Labs, Hugging Face, Meta, Alibaba, the BigScience Workshop, AI Singapore, SpeakLeash, SCB 10X, Fraunhofer IAIS, and the National Library of Norway), this study presents a comprehensive cartography of collaboration practices throughout the lifecycle of open LLMs across diverse organizational contexts, from grassroots initiatives to large technology companies, and world regions. The study provides researchers, developers, business leaders, policymakers, and the wider community with empirical insights into collaboration practices, including motivations, opportunities and challenges, in the emerging open source AI community as well as practical recommendations for participation in or promotion of open source AI collaboration.

Johan Linåker

Senior Researcher at RISE Research Institutes of Sweden and an Adjunct Assistant Professor at Lund University.

Esta palestrante também aparece em:

Cailean Osborne

Cailean Osborne is a Senior Researcher at the Linux Foundation, where he conducts strategic research and advocacy for promoting openness in AI. He has a PhD in Social Data Science from the University of Oxford, where he researched collaboration dynamics in the open source AI ecosystem. During his PhD, he was a visiting researcher at the Open Source Software Data Analytics Lab at Peking University. Previously, he was the International Policy Lead at the UK Government's Centre for Data Ethics and Innovation, where he co-authored the UK's National AI Strategy and served as a UK Delegate at intergovernmental AI governance initiatives at the OECD and Council of Europe. He is based in Berlin, Germany.

Esta palestrante também aparece em:

Q&A Panel: Open Source and AI

A Cartography of Collaboration in Open Source AI: Mapping Collaboration in the Development and Reuse Lifecycle of 12 Open Large Language Models .ical 18/11/2025 16:40–17:10, Main Room

A Cartography of Collaboration in Open Source AI: Mapping Collaboration in the Development and Reuse Lifecycle of 12 Open Large Language Models
.ical
18/11/2025 16:40–17:10, Main Room