Modern NLP for Proactive Harmful Content Moderation PyCon DE & PyData 2025

Modern NLP for Proactive Harmful Content Moderation
.ical

2025-04-23 17:50–18:20, Hassium

Despite an array of regulations implemented by governments and social media platforms worldwide (i.e. famous DSA), the problem of digital abusive speech persists. At the same time, rapid advances in NLP and large language models (LLMs) are opening up new possibilities—and responsibilities—for using this technology to make a positive social impact. Can LLMs streamline content moderation efforts? Are they effective at spotting and countering hate speech, and can they help produce more proactive solutions like text detoxification and counter-speech generation?

In this talk, we will dive into the cutting-edge research and best practices of automatic textual content moderation today. From clarifying core definitions to detailing actionable methods for leveraging multilingual NLP models, we will provide a practical roadmap for researchers, developers, and policymakers aiming to tackle the challenges of harmful online content. Join us to discover how modern NLP can foster safer, more inclusive digital communities.

The rise of large language models (LLMs) has revolutionized natural language processing (NLP), creating opportunities to address complex societal challenges, including the pervasive issue of harmful online content. Despite global regulations and platform-specific policies, abusive speech and toxic content continue to plague digital spaces, highlighting the need for smarter, scalable, and multilingual solutions.

This talk explores how modern NLP technologies can play a transformative role in content moderation, moving beyond traditional detection methods to proactive measures that promote healthier online interactions. We will cover key topics, including:

Understanding the Landscape: Definitions and nuances of harmful content categories, including hate speech, misinformation, and harassment. We will bring practices not only from CS field, but from communication with social scientists and NGOs.
Hate Speech Detection: Can LLMs detect hate speech? How the models can be adapted to new languages?
Text Detoxification: Diving into nuances of toxicity of 9 languages (from our recent shared task) and sharing best practice on LLMs prompting for texts detoxification.
Counter-Speech Generation: Our recent research results on how make LLMs generate not a very general "Please, it is not ok to talk like this report" but indeed address the targeted group.
Ethical Considerations: Who, in the end, responsible for the content moderation? How the community can help to bring best practices? How the measure the "effectiveness" of LLMs for content moderation?

Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Novice

Daryna Dementieva

Hello, I’m Dr. Daryna Dementieva. Driven by both personal experiences and a deep passion, I am a dedicated advocate and researcher focused on leveraging AI and NLP for Positive Social Impact. Currently (as a technical person) I am exploring collaborations with NGOs and social scientists to bridge the gap between cutting-edge AI technology and societal needs. My goal is to share insights on responsible AI and Data Science, inspiring and enabling projects in these fields to transition from concept to impactful reality.

Modern NLP for Proactive Harmful Content Moderation .ical 2025-04-23 17:50–18:20, Hassium

Modern NLP for Proactive Harmful Content Moderation
.ical

2025-04-23 17:50–18:20, Hassium