BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//euroscipy-2026//speaker//3U8HLG
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-PRCPUX@pretalx.com
DTSTART;TZID=CET:20260720T121000
DTEND;TZID=CET:20260720T123000
DESCRIPTION:Scientific organizations struggle to extract actionable insight
 s from publication data tagged with inconsistent and noisy keywords. Trans
 forming hundreds of thousands of such keywords into a **110\,000+ concept*
 *\, semantically consistent taxonomy\, and attaching them hierarchically a
 t scale\, requires more than ad-hoc normalization: it demands careful syst
 em design.\n\nThis talk presents a production-grade pipeline that extends 
 **OpenAlex's 4-level framework** (Domain → Field → Subfield → Topic)
  with a granular **Concept layer**\, resulting in a **5-level scientific t
 axonomy**. The system combines **SPECTER2 embeddings** to model semantic s
 imilarity\, **Leiden graph clustering** to group 100K+ concepts\, and **Qd
 rant** for efficient vector-based hierarchical attachment.\n\nA central co
 ntribution is a strategic\, multi-stage integration of **LLMs**. Rather th
 an using LLMs end-to-end\, we deploy them at **5 targeted points** where s
 emantic judgment matters most: concept granularity filtering\, field class
 ification across **26 domains**\, cluster renaming\, explanation generatio
 n and validation of topic assignments using multi-embedding comparisons. D
 eterministic methods ensure scalability and reproducibility\, while LLMs p
 rovide semantic precision where embeddings alone fall.\n\nThe resulting ta
 xonomy is used in production to automatically tag **millions of publicatio
 ns**\, enabling real-time trend detection and portfolio-level analytics th
 at support strategic decision-making.
DTSTAMP:20260603T191419Z
LOCATION:Room 1.19 (Ground Floor\, Shannon)
SUMMARY:Building a Scientific Taxonomy at Scale with Graph Clustering\, Emb
 eddings\, and LLMs - Daniele Raimondi\, Feichi Lu
URL:https://pretalx.com/euroscipy-2026/talk/PRCPUX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-G7BJFJ@pretalx.com
DTSTART;TZID=CET:20260721T152000
DTEND;TZID=CET:20260721T154000
DESCRIPTION:Scientific organizations struggle to extract actionable insight
 s from publication data tagged with inconsistent and noisy author keywords
 . Automatically assigning papers to consistent\, semantically grounded con
 cepts is essential for reliable trend detection\, search\, and analytics.\
 n\nThis talk presents a production-grade\, **two-stage classification pipe
 line** that tags hundreds of thousands of scientific papers against a **11
 0K+ concept taxonomy**. Given a fixed hierarchical taxonomy extending Open
 Alex's **4-level structure** with a granular concept layer\, the system co
 mbines **vector-based retrieval**\, **cross-encoder reranking**\, and targ
 eted **LLM validation** to achieve scalable and accurate paper classificat
 ion.\n\nIn **Stage 1 (Candidate Retrieval)**\, paper metadata (title\, abs
 tract\, author keywords) is embedded using **SPECTER2** and queried agains
 t **Qdrant** to retrieve a small\, high-recall candidate set from over **1
 10\,000 concepts**. In **Stage 2 (Reranking and Filtering)**\, **cross-enc
 oder models** perform fine-grained semantic matching\, while **LLMs (Azure
  OpenAI)** are selectively applied to resolve ambiguous cases and produce 
 confidence-scored assignments.\n\nDeployed on **millions of publications**
 \, the system standardizes noisy keywords and enriches paper metadata with
  semantically consistent concept tags\, enabling downstream analytics at s
 cale.
DTSTAMP:20260603T191419Z
LOCATION:Room 1.38 (Ground Floor\, Turing)
SUMMARY:Automating Scientific Paper Classification at Scale with Retrieval
 –Reranking and LLMs - Daniele Raimondi\, Feichi Lu
URL:https://pretalx.com/euroscipy-2026/talk/G7BJFJ/
END:VEVENT
END:VCALENDAR
