2025-12-08 –, Abigail Adams
PubMed is a free search interface for biomedical literature, including citations and abstracts from many life science scientific journals. It is maintained by the National Library of Medicine at the NIH. Yet, most users only interact with it through simple keyword searches. In this hands-on tutorial, we will introduce PubMed as a data source for intelligent biomedical research assistants — and build a Health Research AI Agent using modern agentic AI frameworks such as LangChain, LangGraph, and Model Context Protocol (MCP) with minimum hardware requirements and no key tokens. To ensure compatibility, the agent will run in a Docker container which will host all necessary elements.
Participants will learn how to connect language models to structured biomedical knowledge, design context-aware queries, and containerize the entire system using Docker for maximum portability. By the end, attendees will have a working prototype that can read and reason over PubMed abstracts, summarize findings according to a semantic similarity engine, and assist with literature exploration — all running locally on modest hardware.
Expected Audience: Enthusiasts, researchers, and data scientists interested in AI agents, biomedical text mining, or practical LLM integration.
Prior Knowledge: Python and Docker familiarity; no biomedical background required.
Minimum Hardware Requirements: 8GB RAM (+16GB recommended), 30GB disk space, Docker pre-installed. MacOS, Windows, Linux.
Key Takeaway: How to build a lightweight, reproducible research agent that combines open biomedical data with modern agentic AI frameworks.
This 90-minute hands-on tutorial explores how modern agentic AI applications can transform simple keyword searches into intelligent research assistants. PubMed, maintained by the U.S. National Library of Medicine, provides over 36 million indexed biomedical records. While most users rely on simple keyword searches, this session demonstrates how to build a Health Research AI Agent that can search, group, summarize, and reason over PubMed abstracts — using only open data and lightweight infrastructure.
The session walks through the complete development pipeline: querying PubMed, parsing structured metadata, and integrating modern agentic AI frameworks such as LangChain, LangGraph, and Model Context Protocol (MCP) to enable context-aware reasoning. All components will run inside a Docker container, ensuring a lightweight and reproducible environment that requires no API keys or external cloud resources. This flexibility comes at a minimal hardware cost (8GB RAM, 30GB disk).
Participants will gain hands-on experience connecting different language models that perform distinct and isolated tasks into a coherent agentic solution. There will be a 'maestro' agent which will be responsible for improving the user's query, downloading abstracts from PubMed and sending them to the classifier agent which will group the abstracts according to their 'abstract embeddings'. Finally, the summarizing agent will summarize abstracts within each identified class. By the end of the tutorial, each attendee will have a working local prototype capable of summarizing research findings and assisting with literature exploration—using only open data and modest hardware.
Outline:
0–10 min: Setup & Overview of PubMed. Clone GitHub repo.
10–20 min: Overview of Transformers, Embeddings and Agentic AI.
20-50 min: Understand the basics of the codebase. Choose-your-own-prompt activity. See how LangChain/LangGraph and MCP work together
50–65 min: Reasoning & Semantic Search
65–85 min: Deploying Docker container & Testing
85–90 min: Wrap-Up & Discussion
Expected Audience: Researchers and enthusiasts interested in AI agents, biomedical NLP, or practical LLM applications.
Prior Knowledge:
Familiarity with Python and Docker; no biomedical background required.
Key Takeaway:
How to build a reproducible, self-contained AI research assistant that combines open biomedical data with modern agentic AI frameworks for reasoning and retrieval.