PyData Boston 2025

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare
2025-12-09 , Horace Mann

Informed Consent Forms (ICFs) are critical documents in clinical trials. They are the first, and often most crucial, touchpoint between a patient and a clinical trial study. Yet the process of developing them is laborious, high-stakes, and heavily regulated. Each form must be tailored to jurisdictional requirements and local ethics boards, reviewed by cross-functional teams, and written in plain language that patients can understand. Producing them at scale across countries and disease areas demands manual effort and creates major operational bottlenecks. We used a combination of traditional AI and large language models to autodraft the ICF across clinical trial types, across countries and across disease areas at scale. The build, test, iteration and deployment offers both technical and non technical lessons learned for generative AI applications for complex documents at scale and for meaningful impact.


Informed Consent Forms are highly complex documents that require high precision and quality. A phase 2 / 3 clinical trial can have almost 1000 different forms that takes considerable time to complete.We identified this challenge that directly impacts trial timelines and patient engagement. The automated AI solution: the “ICF Autodrafter”, a custom LLM-powered application that automates the drafting of ICFs. This tool ingests a clinical trial protocol and ICF template and outputs a complete draft in minutes, cutting document preparation time by 90%.

This solution is not generic automation. The backend logic parses highly structured protocol documents, segments them, and feeds the relevant content into a carefully fine-tuned LLM that maps text to specific ICF fields. The front-end is designed for usability by clinical trial managers, with human-in-the-loop reviews. This system has already supported ICF creation for more than ten trials and has achieved near-perfect consistency (97%) with human-generated content, underscoring the speed, quality, and robustness of the solution.

We rigorously test version with A/B comparisons, iterated with feedback from end-users, and anchored all development within regulatory and ethical guardrails. The impact extends beyond efficiency. By standardizing and accelerating ICF production, we can reduce delays in trial start-up and potentially get medicines to patients faster, without compromising safety, compliance, or clarity. Furthermore, it also lays down a scalable model for future AI-driven document workflows across other parts of life sciences and healthcare.


Prior Knowledge Expected: No previous knowledge expected

Aman Bhandari leads the corporate data science/AI function at Vertex Pharmaceuticals. This division integrates and scales advanced analytics and AI (e.g. NLP, machine learning, generative AI/LLMs) across disease and business areas including clinical, commercial, manufacturing and HR. Collaborating directly with executive management, our privacy office and IT, he has developed enterprise capabilities and a model for using AI to drive impact

Prior to joining Vertex in 2017, he held roles at Merck, Genentech, the White House, and the Centers for Medicare and Medicaid Services. In these roles he created the first formal data science team at Merck and while at the White House paved the wave for the first Chief Data Officers across the U.S. Government. Aman earned my PhD in health services research and a master's in epidemiology, with a focus on using large scale data to better understand healthcare. He have been an advisor to data/tech initiatives for the World Bank, USAID, Harvard, Cornell, Ashoka Foundation, Knight News Foundation, Boston Children's Hospital and others.

Lily Xu is a Clinical Data Science Director in the corporate data science team at Vertex Pharmaceuticals. She has a wealth of experience in building cutting-edge data science solutions for clinical operations and commercial strategy across multiple disease areas. Lily has spearheaded projects on GenAI for clinical documentation, predictive patient modeling, large-scale claims analytics, data-driven protocol design, and centralized site intelligence applications.