PyCon DE & PyData 2026

Tracking Knowledge Diversity in LLM-Generated Responses.
, Platinum [2nd Floor]

As large language models (LLMs)-powered “AI highlights” become the first information people see on the Web, a key question arises: how much variety and perspective do these systems actually deliver for information-seeking queries? Do LLMs offer broader viewpoints than traditional search or Wikipedia pages? Do larger models really produce more diverse answers—or are they all converging on the same language, and framing, raising concerns about “knowledge collapse”?

Drawing insights from experiments across LLM families, real-world topics, and hundreds of user-style prompts, this talk introduces an open-source framework for benchmarking and tracking epistemic diversity in LLMs. We focus on practical lessons for data scientists building and evaluating LLM-powered search, summaries, and knowledge systems—where diversity of information actually matters.


This talk summarizes our research on how LLMs generate narratives and recurring tropes in real-world information-seeking setups via prompting.

Talk outline:
* Knowledge collapse and epistemic diversity: What they mean and why they matter for real-world information access (5 mins).
* Framework overview: How we measure epistemic diversity across LLM outputs (5 mins).
* Experimental design, results: Curating dataset for comparisons across model families, search results, and Wikipedia pages (7 mins).
* Implications for designing LLM-powered systems that preserve information diversity (10 mins)

Key takeaways for AI practitioners:
* When can retrieval-augmented generation (RAG) increase diversity?
* Can expanding Wikipedia via translation improve epistemic diversity or reinforce existing tropes?
* What are some open challenges in measuring cultural and contextual diversity in LLM outputs?
* Where are we headed in terms of model sizes, fluency, and breadth of knowledge?

Useful links:
* Our open source framework
* Reproducible Data Hugging Face
* Our Research paper


Expected audience expertise in your talk's domain:: Intermediate Expected audience expertise in Python:: Intermediate Public link to supporting material, e.g. videos, Github::

https://github.com/dwright37/llm-knowledge

See also:

Sarah Masud is currently a postdoc at the University of Copenhagen, exploring stereotypes and narratives. During her PhD from Indraprastha Institute of Information Technology, New Delhi, she explored the role of different context cues in improving computational hate speech-related tasks