PyConDE & PyData Berlin 2024

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
2024-04-24 , B07-B08

With the latest advancements in Natural Language Processing and Large Language Models (LLMs), and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?

I don’t think so, and in this talk, I’ll show you why. I’ll dive deeper into the open-source model ecosystem, some common misconceptions about use cases for LLMs in industry, practical real-world examples and how basic principles of software development such as modularity, testability and flexibility still apply. LLMs are a great new tool in our toolkits, but the end goal remains to create a system that does what you want it to do. Explicit is still better than implicit, and composable building blocks still beat huge black boxes.


As ideas develop, we’re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. In this talk, I'll share some practical approaches that you can apply today. If you’re trying to build a system that does a particular thing, you don’t need to transform your request into arbitrary language and call into the largest model that understands arbitrary language the best. The people developing those models are telling that story, but the rest of us aren’t obliged to believe them.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Novice

Abstract as a tweet (X) or toot (Mastodon):

Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies? I don’t think so, and in this talk, I’ll show you why.

Ines Montani is a developer specializing in tools for AI and NLP technology. She’s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.