2026-06-05 –, Doddington Forum
AI systems that can autonomously navigate websites, fill forms, extract data, and complete multi-step workflows; are one of the most exciting and practical applications of large language models in 2026. Libraries like browser-use (60k+ GitHub stars) and Skyvern have demonstrated their potential, but their abstractions can obscure the surprisingly approachable fundamentals underneath.
In this 90-minute hands-on tutorial, attendees will build a browser agent entirely from scratch using only Python, Playwright, and an LLM API. No agent frameworks, no magic; just the core building blocks: extracting and structuring the DOM into an LLM-friendly representation, capturing screenshots for vision-based reasoning, building the observe-think-act agent loop, and handling real-world challenges like dynamic content, multi-tab navigation, and error recovery.
By building from first principles, attendees will gain a deep understanding of how browser agents actually work; knowledge that transfers directly to using, debugging, and extending any browser agent framework. Every participant will leave with a working agent that can autonomously complete tasks on live websites.
This tutorial is aimed at Python developers and data scientists who are curious about AI-driven browser automation. Basic Python proficiency and familiarity with async/await are expected. No prior experience with Playwright, browser automation, or agent frameworks is required.
The web is the world’s largest API, but it was designed for humans, not machines. Traditional browser automation tools like Selenium and Playwright require developers to write brittle scripts with hardcoded selectors that break whenever a website changes its layout. Browser agents flip this model: instead of telling the browser exactly what to click, you describe what you want to accomplish, and an LLM figures out how to do it; reading the page like a human would, reasoning about what to do next, and adapting when things don’t go as expected.
This approach has seen explosive growth. The open-source browser-use library surpassed 60,000 GitHub stars within months of release, and its creators raised $17M in seed funding. Skyvern, Browserbase, and others have built commercial platforms around the same idea. Under the hood, these tools all share a remarkably similar architecture: a perception layer that converts web pages into LLM-readable context, a reasoning layer where the LLM decides what action to take, and an execution layer that carries out the action via browser automation.
This tutorial strips away the abstraction layers and builds each component from scratch. The “from scratch” approach is deliberate: by understanding how the DOM is parsed, how screenshots are fed to vision models, and how the agent loop manages state, attendees gain transferable knowledge that applies to any browser agent tool or framework. When something breaks in production (and it will), this understanding is what separates debugging from guessing.
Richard Kehinde Ogunyale is a Senior Software Engineer based in London, UK, with experience building production AI systems, scalable microservices, and machine learning pipelines. He currently works at Partnerize, where he leads projects involving AI-powered solutions, and has previously built RAG systems with vector databases, LLM-powered automation workflows using DAG architectures at scale.
He is passionate about open source, practical AI engineering, and bridging the gap between ML prototypes and reliable production systems.
Oreolorun is a machine learning engineer with experience in building AI enabled software features and data processing for AI workflows.