AI systems that can autonomously navigate websites, fill forms, extract data, and complete multi-step workflows; are one of the most exciting and practical applications of large language models in 2026. Libraries like browser-use (60k+ GitHub stars) and Skyvern have demonstrated their potential, but their abstractions can obscure the surprisingly approachable fundamentals underneath.
In this 90-minute hands-on tutorial, attendees will build a browser agent entirely from scratch using only Python, Playwright, and an LLM API. No agent frameworks, no magic; just the core building blocks: extracting and structuring the DOM into an LLM-friendly representation, capturing screenshots for vision-based reasoning, building the observe-think-act agent loop, and handling real-world challenges like dynamic content, multi-tab navigation, and error recovery.
By building from first principles, attendees will gain a deep understanding of how browser agents actually work; knowledge that transfers directly to using, debugging, and extending any browser agent framework. Every participant will leave with a working agent that can autonomously complete tasks on live websites.
This tutorial is aimed at Python developers and data scientists who are curious about AI-driven browser automation. Basic Python proficiency and familiarity with async/await are expected. No prior experience with Playwright, browser automation, or agent frameworks is required.