EuroSciPy 2026

Diogo Rodrigues

Senior AI Engineer with 7+ years of experience architecting and deploying end-to-end ML solutions at scale. Specialized in NLP, Generative AI (LLM, RAG), Vector Search, and MLOps.

Affiliation:

MDPI

Position / Job:

Data Scientist Lead


Sessions

07-20
15:20
30min
Finding the Right ROR: Semantic Search for Research Institutions
Diogo Rodrigues, Frank Sauerburger

Mapping freeform research affiliations to persistent identifiers such as ROR (Research Organization Registry) is harder than it looks. Institution names appear in many forms such as abbreviations, alternate spellings, local languages, or legacy names, thus making a reliable mapping difficult to achieve at scale.

In this talk, we present a semantic retrieval pipeline that reframes institution identification as a search problem rather than a string-matching task. Our system combines named entity recognition to extract institution entities, dense embeddings to represent their semantic meaning, and vector search to retrieve the most likely ROR matches. This approach allows us to handle noisy, incomplete, and multilingual inputs while remaining resilient to variation in how institutions are referenced.

By treating institution matching as semantic retrieval, we improve recall and robustness without relying on heuristics or on a continuous expanding rule-based approach. The system scales naturally as new institutions are added and as naming conventions evolve, making it well suited for the dynamic research environment.

We will share implementation details, evaluation results, and practical lessons learned from deploying this pipeline in a real-world production setting.

Applied AI & LLM Technologies and Use Cases
Room 1.19 (Ground Floor, Shannon)
07-21
15:20
20min
Boring AI Works: When BERT Beats Billion-Parameter Models
Diogo Rodrigues

Recent advances in AI have shifted industries’ attention toward integrating LLM-based systems. Even though LLMs can solve a wide range of business problems, they came with a significant complexity overhead. At same time, many real-world business applications involve well-defined objectives, predictable inputs, and clear evaluation criteria.

Today, we are increasingly seeing a default pattern: for almost any NLP use case, teams prompt GPT-like models and pay the bill at the end of the month. However, this approach often introduces unnecessary complexity, costs, and operational risk. Many business and research problems exist in constrained environments that can be solved with simpler techniques, achieving the same or higher success rates.

This talk defends that fine-tuned BERT-based models remain a strong and often superior choice for targeted business use cases that require NLP-based solutions. I propose to present a real, in-production use case where a simple transformer-based classifier demonstrates a more favourable performance-cost trade-off than LLM-based approaches, driven by lower latency, reduced operational complexity, easier fine-tuning, and significantly lower maintenance costs.

The goal of this presentation is not to reject LLMs, but to promote a pragmatic, outcome-driven approach to NLP, where “boring” solutions often deliver the most value.

Applied AI & LLM Technologies and Use Cases
Room 2.41 (First Floor, Turing)