PyData London 2026

Building a Browser Agent from Scratch: Teach an LLM to Navigate the Web
2026-06-05 , Doddington Forum

AI systems that can autonomously navigate websites, fill forms, extract data, and complete multi-step workflows; are one of the most exciting and practical applications of large language models in 2026. Libraries like browser-use (60k+ GitHub stars) and Skyvern have demonstrated their potential, but their abstractions can obscure the surprisingly approachable fundamentals underneath.

In this 90-minute hands-on tutorial, attendees will build a browser agent entirely from scratch using only Python, Playwright, and an LLM API. No agent frameworks, no magic; just the core building blocks: extracting and structuring the DOM into an LLM-friendly representation, capturing screenshots for vision-based reasoning, building the observe-think-act agent loop, and handling real-world challenges like dynamic content, multi-tab navigation, and error recovery.

By building from first principles, attendees will gain a deep understanding of how browser agents actually work; knowledge that transfers directly to using, debugging, and extending any browser agent framework. Every participant will leave with a working agent that can autonomously complete tasks on live websites.

This tutorial is aimed at Python developers and data scientists who are curious about AI-driven browser automation. Basic Python proficiency and familiarity with async/await are expected. No prior experience with Playwright, browser automation, or agent frameworks is required.


The web is the world’s largest API, but it was designed for humans, not machines. Traditional browser automation tools like Selenium and Playwright require developers to write brittle scripts with hardcoded selectors that break whenever a website changes its layout. Browser agents flip this model: instead of telling the browser exactly what to click, you describe what you want to accomplish, and an LLM figures out how to do it; reading the page like a human would, reasoning about what to do next, and adapting when things don’t go as expected.

This approach has seen explosive growth. The open-source browser-use library surpassed 60,000 GitHub stars within months of release, and its creators raised $17M in seed funding. Skyvern, Browserbase, and others have built commercial platforms around the same idea. Under the hood, these tools all share a remarkably similar architecture: a perception layer that converts web pages into LLM-readable context, a reasoning layer where the LLM decides what action to take, and an execution layer that carries out the action via browser automation.

This tutorial strips away the abstraction layers and builds each component from scratch. The “from scratch” approach is deliberate: by understanding how the DOM is parsed, how screenshots are fed to vision models, and how the agent loop manages state, attendees gain transferable knowledge that applies to any browser agent tool or framework. When something breaks in production (and it will), this understanding is what separates debugging from guessing.

Richard Kehinde Ogunyale is a Senior Software Engineer based in London, UK, with experience building production AI systems, scalable microservices, and machine learning pipelines. He currently works at Partnerize, where he leads projects involving AI-powered solutions, and has previously built RAG systems with vector databases, LLM-powered automation workflows using DAG architectures at scale.

He is passionate about open source, practical AI engineering, and bridging the gap between ML prototypes and reliable production systems.

Oreolorun Olu-Ipinlaye is a Machine Learning/AI Engineer at Crowdhelix in London, where he builds production AI systems end-to-end for a platform connecting researchers with EU funding opportunities. As the lead engineer behind ReviewIQ; a self-hosted-LLM proposal review tool used by researchers across dozens of organisations; he has helped researchers in assessing their proposals before submission leading to more competitive proposals.

His work spans the full stack of applied ML: self-hosted LLM infrastructure, recommender systems, semantic and hybrid search, and the event-driven pipelines underneath, built largely in Python. He's particularly drawn to taking ML products from idea to adoption with measurable impact, and to making AI capabilities legible to non-technical stakeholders.

He holds an MSc in Artificial Intelligence and Data Science from the University of Hull, where he earned the award for Best Overall Performance.