BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//euroscipy-2026//talk//PRCPUX
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-PRCPUX@pretalx.com
DTSTART;TZID=CET:20260720T121000
DTEND;TZID=CET:20260720T123000
DESCRIPTION:Scientific organizations struggle to extract actionable insight
 s from publication data tagged with inconsistent and noisy keywords. Trans
 forming hundreds of thousands of such keywords into a **110\,000+ concept*
 *\, semantically consistent taxonomy\, and attaching them hierarchically a
 t scale\, requires more than ad-hoc normalization: it demands careful syst
 em design.\n\nThis talk presents a production-grade pipeline that extends 
 **OpenAlex's 4-level framework** (Domain → Field → Subfield → Topic)
  with a granular **Concept layer**\, resulting in a **5-level scientific t
 axonomy**. The system combines **SPECTER2 embeddings** to model semantic s
 imilarity\, **Leiden graph clustering** to group 100K+ concepts\, and **Qd
 rant** for efficient vector-based hierarchical attachment.\n\nA central co
 ntribution is a strategic\, multi-stage integration of **LLMs**. Rather th
 an using LLMs end-to-end\, we deploy them at **5 targeted points** where s
 emantic judgment matters most: concept granularity filtering\, field class
 ification across **26 domains**\, cluster renaming\, explanation generatio
 n and validation of topic assignments using multi-embedding comparisons. D
 eterministic methods ensure scalability and reproducibility\, while LLMs p
 rovide semantic precision where embeddings alone fall.\n\nThe resulting ta
 xonomy is used in production to automatically tag **millions of publicatio
 ns**\, enabling real-time trend detection and portfolio-level analytics th
 at support strategic decision-making.
DTSTAMP:20260603T195503Z
LOCATION:Room 1.19 (Ground Floor\, Shannon)
SUMMARY:Building a Scientific Taxonomy at Scale with Graph Clustering\, Emb
 eddings\, and LLMs - Daniele Raimondi\, Feichi Lu
URL:https://pretalx.com/euroscipy-2026/talk/PRCPUX/
END:VEVENT
END:VCALENDAR
