, Platinum [2nd Floor]
As large language models (LLMs)-powered “AI highlights” become the first information people see on the Web, a key question arises: how much variety and perspective do these systems actually deliver for information-seeking queries? Do LLMs offer broader viewpoints than traditional search or Wikipedia pages? Do larger models really produce more diverse answers—or are they all converging on the same language, and framing, raising concerns about “knowledge collapse”?
Drawing insights from experiments across LLM families, real-world topics, and hundreds of user-style prompts, this talk introduces an open-source framework for benchmarking and tracking epistemic diversity in LLMs. We focus on practical lessons for data scientists building and evaluating LLM-powered search, summaries, and knowledge systems—where diversity of information actually matters.
This talk summarizes our research on how LLMs generate narratives and recurring tropes in real-world information-seeking setups via prompting.
Talk outline:
* Knowledge collapse and epistemic diversity: What they mean and why they matter for real-world information access (5 mins).
* Framework overview: How we measure epistemic diversity across LLM outputs (5 mins).
* Experimental design, results: Curating dataset for comparisons across model families, search results, and Wikipedia pages (7 mins).
* Implications for designing LLM-powered systems that preserve information diversity (10 mins)
Key takeaways for AI practitioners:
* When can retrieval-augmented generation (RAG) increase diversity?
* Can expanding Wikipedia via translation improve epistemic diversity or reinforce existing tropes?
* What are some open challenges in measuring cultural and contextual diversity in LLM outputs?
* Where are we headed in terms of model sizes, fluency, and breadth of knowledge?
Useful links:
* Our open source framework
* Reproducible Data Hugging Face
* Our Research paper
Sarah Masud is currently a postdoc at the University of Copenhagen, exploring stereotypes and narratives. During her PhD from Indraprastha Institute of Information Technology, New Delhi, she explored the role of different context cues in improving computational hate speech-related tasks