PyCon Hong Kong 2025
We learn Python through tutorials, bootcamps, or classes. Syntax here, libraries there. Maybe a few YouTube deep dives and some late-night Stack Overflow scrolling. The resources are endless.
But here’s what many miss: Python isn’t just a language. It’s a living, breathing community. And knowing that changes everything.
In this talk, we’ll look at the part of Python that doesn’t come in a package: its people. You’ll learn what the Python Software Foundation actually does (besides existing), how decisions about the language are made, and why even local meetups in Hong Kong are part of something much bigger.
Whether you’re new to Python, teaching it, or wondering what keeps this language thriving across the world—this talk will give you new ways to connect, contribute, and grow.
Because “Come for the language, stay for the community” isn’t just a feel-good line. It’s Python’s secret sauce. And once you taste it, it’s better than Lee Kum Kee.
This presentation explores the rapidly evolving field of Agentic AI that is gaining significant industry attention, with a focus on practical Python-based implementations.
The first section introduces the fundamental concepts and technical architecture of Agentic AI, Agentic IDE (Kiro), AgentCore, and MCP.
The second section demonstrates concrete applications of Agentic AI on AWS built with Python, featuring a conversational weather agent implemented using Nova Act and Python libraries, alongside analyses of agent-based solutions for e-commerce and hotel reservation systems.
The third section provides practical guidance on developing enterprise-grade Agentic AI applications using the open-source Strands Agents SDK with Python.
This presentation aims to inspire innovative approaches among Python developers and foster collaborative exploration of the expanding potential in Python-based Agentic AI application development.
Every new LLM comes with glowing performance on English-centric benchmarks. This makes it difficult to predict how that performance will translate to business use cases in other languages or specialized domains. At Mercari, Japan's largest C2C marketplace, we faced this exact problem with Japanese. Inspired by Kagi, Wolfram, and Aider benchmarks, we are building our own continuously updated internal benchmark to evaluate major LLMs on unpolluted, business-critical tasks that models have not seen in their training data. The talk will cover task design, an evaluation pipeline in Python, a comparison of the latest models on accuracy, cost, and latency, and practical lessons for creating your own benchmark tailored to your needs.
Have you ever wished you could run Python directly in the browser—without any server backend or WebSocket setup?
In this talk, I will introduce Pyodide, a WebAssembly-based Python distribution that brings the full Python runtime to the browser. You’ll learn how Pyodide enables scientific computing, data visualization, and interactive notebooks—all in client-side web apps. We’ll explore how Pyodide is built, how to integrate it with JavaScript, and the real-world limitations to watch out for.
Whether you're building a no-backend playground, an AI-enabled tool, or an offline-capable app, Pyodide opens new doors for Python in the frontend.
pandas 3.0 is going to be released in the next months. While pandas hasn't changed much since it became popular many years ago, pandas 3 does bring some interesting new features that will make working with data easier, faster and more powerful. In this talk I will explain what's new in pandas 3.0.
In this talk, Python and quantitative methods are used to access, validate, and analyze fundamental financial data from the US Securities and Exchange Commission (SEC) EDGAR system. The SEC's JSON API provides structured financial data, derived from company filings reported in eXtensible Business Reporting Language (XBRL), an international standard for financial reporting. Pydantic is used for robust data validation. Attendees will learn to:
- Fetch basic metrics
- Calculate financial ratios
- Visualize trends
- Navigate common data challenges
While focused on the US market data, a brief explanation of the international landscape will also be provided. No finance background required. Basic Python is required to understand the data processing part.
PyTorch is the the most popular deep learning framework used in both academic and industry. Researchers and engineers use PyTorch to train their models with the convenient and well-defined neural network APIs in eager mode. After training, one has to export the model computation flow as a graph to a framework-agnostic format (e.g., ONNX, TensorRT, etc.) to deploy the trained model for inference with better performance. Thus, we need a mechanism to capture the computation operations happend inside a DNN model.
Graph capturing aims to track and record the PyTorch API invocations used in a model. Since the release of PyTorch 2.0 in 2023, there are two primary machineries to trace a model computation graph: fx.symbolic_trace
and Dynamo. In this proposal, I'd like to introduce the usage of both kinds of APIs, and further dissect the underlying working machinary of these two approaches. We also tend to discuss the limitations of both, showing that fx.symbolic_trace
is limited by the static symbolic tracing mode and Dynamo is still on its early age.
So, you want to build an AI-powered application in Python. You will need a UI to accept user input, a database to store user data, and LLM deployments. You will also need to engineer prompts that guide the model to generate the desired output for your application. Yet, with all of these elements in place, there remains the immense challenge of coordinating their interaction. For a very simple task, filling a few user-inputted variables into a prompt for use with a single LLM deployment endpoint, the code for this might be only a few dozen lines in a single script. But what if you want to be able to swap between different models,. deployments, or APIs? And how will you handle problems such as hitting quota limits or encountering irregularities in LLM output, or test how you process inputs and outpus? Or what if you want to build a more complex AI system, such as one that draws from internal or external data sources to enrich the model's context (RAG) or that is able to dynamically generate additional prompts in response to user requests (agents)? These are all questions that concern how to build an AI pipeline. Several Python Python packages -- including LangChain, LlamaIndex, and Haystack -- claim to offer end-to-end frameworks for building sophisticated AI pipelines. The purpose of this talk is to present a systematic comparison of the alternatives available to developers working in Python for AI pipeline construction, including the option of just writing a framework for oneself.
Many hesitate to join tech communities or take on leadership roles, fearing they’re not "ready." My journey was anything but planned—I simply said "yes" to opportunities, even when unprepared. From attending meetups as a newcomer to becoming a core volunteer and now PyCon APAC 2025 Co-Chair, my story is one of commitment, uncertainty, and growth.
In this talk, I’ll share how showing up, embracing challenges, and volunteering transformed my career and community involvement. I'll also break down common misconceptions about tech meetups and how anyone regardless of experience can start contributing. If you’re hesitant to take that first step, this talk will inspire you to leap, learn, and find your place in the Python and tech community.
Ever opened your music folder only to be greeted by a chaotic mix of album and track names that trigger your inner OCD? This talk explores the journey from manual cleanup to building a Python tool that brings order to musical chaos. By harnessing Retrieval-Augmented Generation (RAG), Large Language Models (LLMs), and natural language processing (NLP), the project intelligently standardizes music libraries across diverse naming conventions and languages. The presentation focuses on the creative process, core AI concepts, and key lessons learned, tracing the path from tedious folder cleaning to an automated, AI-powered solution.
Hong Kong’s Mandatory Provident Fund (MPF) is supposed to secure your retirement—but what if it’s secretly shrinking your savings? Using Python, we’ll simulate how high fees erode your returns over time, compare MPF performance against low-cost alternatives (like index funds), and uncover whether sticking with the default plan could cost you millions. Through interactive visualisations, we’ll expose the maths behind, and explore how self-directed investing might be the escape hatch.
Terraform is powerful—but it has limits. In this talk, we’ll explore where Python can complement infrastructure-as-code (IaC): from dynamic resource creation and complex validations to custom workflows and post-deployment logic. Using real-world SRE/DevOps scenarios, we’ll walk through how Python scripts and tools (like Pulumi, Python CDK, or simple custom wrappers) help close Terraform’s gaps. This session is for anyone building infra who’s hit the wall with HCL and needs flexibility—without abandoning automation.
Kubernetes Isekai (異世界)
Kubernetes Isekai is a Python open-source RPG that turns learning Kubernetes into an epic adventure! Tailored for students, it offers:
- 🎮 Interactive gameplay with hands-on Kubernetes tasks
- 🧠 Dynamic NPC interactions that guide and challenge you
- 💻 Practical experience in a fun, gamified environment
- ☁️ Cost-effective deployment using AWS Academy Learner Lab or free GitHub Codespace
Step into a fantastical world where mastering Kubernetes is part of the quest. Learn by doing, explore by playing, and level up your cloud-native skills!
Unlock the power of observability in your software systems! This beginner-friendly, hands-on workshop introduces you to OpenTelemetry, the leading open-source observability framework. Designed for developers, operators, SREs, and even technical managers with little to no experience in observability, this session walks you through the fundamentals of observability, core OpenTelemetry architecture, and how to get started instrumenting your applications with minimal code.
By the end of this session, you’ll have a strong foundational understanding and practical exposure to OpenTelemetry concepts and implementation, equipping you to bring observability to your projects or teams.
Attendees will be provided:
A GitHub repo with sample applications and instrumented code
Cheat sheets and quick-start guides
Large Language Models (LLMs) often struggle to provide current and comprehensive answers from vast, interconnected knowledge bases, a common challenge in fields like business, legal and administrative tasks. While traditional RAG improves LLM context, it can falter with complex, relationship-heavy information. GraphRAG offers a powerful solution by leveraging graph databases to enhance retrieval with structured relationships, leading to deeper contextual understanding.
This talk provides a practical introduction to implementing GraphRAG using Neo4j. We will explore how Neo4j can be used to construct knowledge graphs from unstructured data and enable advanced, relationship-aware retrieval. Attendees will learn the core concepts of GraphRAG and gain practical insights to build smarter RAG systems, capable of delivering more accurate and contextually rich LLM responses for complex real-world applications.
This paper presents a domain-specific DDoS mitigation approach combining DNS redirection, reverse proxy WAF, and kernel-level filtering with eBPF XDP via bpfilter. Instead of using BGP Flowspec, attacker IPs are identified at the origin, uploaded to a central IP list, and dynamically applied as XDP_HOOK rules using a Python-based service. This architecture enables efficient, low-resource blocking for phishing-injected gambling domains without requiring expensive infrastructure, making it ideal for organizations with limited network-layer control.
Large language models face fundamental limitations, including outdated training data, context window constraints, and a tendency to hallucinate. While vector similarity search is a popular solution through Retrieval-Augmented Generation (RAG), simple cosine similarity often fails to capture the nuanced relevance needed for high-quality LLM responses.
This talk explores how search capabilities can enable your LLMs to provide accurate a. We examine how sophisticated retrieval methods serve as external memory, enabling LLMs to access vast knowledge bases beyond their training cutoff. Through hybrid search combining dense and sparse retrieval, semantic reranking, and query reformulation, we demonstrate how search becomes the critical bridge between knowledge and real-world information needs.
Learn how YDB leverages the Model Context Protocol (MCP) to integrate AI models with database systems. This session explores the development of a Python-based MCP server that facilitates seamless interactions between large language models and YDB’s open-source Distributed SQL database management system. Learn about the design decisions, challenges faced, and the practical benefits of integrating MCP into data architecture. Ideal for developers and AI practitioners aiming to enhance AI capabilities with direct database access.
General understanding of Large Language Models (LLM) and Database Management Systems (DBMS). No deep knowledge of Python itself is necessary.
As computer vision becomes more deeply integrated into real-world applications, understanding how and why models make decisions is no longer optional, it’s essential. This talk introduces practical interpretability techniques that bring transparency to computer vision models. We’ll explore popular methods such as Grad-CAM and LIME, and see how they reveal the inner workings of deep learning models through visual explanations. Through live demos, attendees will learn how these tools can be used to debug models, uncover biases, and build trust with users. Designed for Python developers of all backgrounds, this session makes a complex topic accessible even if you're new to machine learning or explainable AI.
Build a production-grade CI/CD pipeline using Python, Docker, Testcontainers, GitHub Actions, and Kubernetes. This workshop takes you from zero to deployment with real-world automation, testing, and monitoring. Perfect for all levels, it blends simplicity, storytelling, and powerful DevOps tools to boost your confidence in delivering reliable software.
每次喺 terminal 輸入 pip install -r requirements.txt
,我都有預咗失敗嘅準備,可能 Python 版本唔夾,可能安裝有先後次序,可能有一兩行有人人手改完根本無試過,實在有太多太多可能!
其實 Python Community 早喺幾年提出嘅 PEP (Python Enhancement Proposals) 621 已經提出一個全新嘅方法,告別 requirements.txt,用 pyproject.toml 取代過往嘅地獄,而且仲有唔少工具例如 poetry setuptools 紛紛採納!
可能你已經用緊 poetry,可能你其實用緊 poetry 1.x 未用到 PEP621 嘅 pyproject.toml,可能你仲用緊 requirements.txt!呢個 Talk 希望可以由淺入深,帶大家一步步了解現代 Python 最 portable 最 robust 嘅 project metadata 寫法,確保呢幾年寫出嚟嘅 python projects,不論係認真製造、隨心玩下 LLM、甚至 vibe coding 也好,過多一排都實實淨淨,可以用得返!
新興的AI工具,例如Github copilot和cursor,令寫程式更方便。但我發現這些工具俏俏地改變了Python這個語言,我會用幾個例子,預測未來Python這個語言的改變,我們可以做甚麼,以及為何這些改變會出現。
The new AI tools, such as GitHub Copilot or Cursor, have made programming easier. However, I observed that these tools silently changed the Python language. I will give a few examples to explain and predict the future of Python language. I will give my opinion on why this happens and what we can do about it.
In today’s data-driven world, handling large datasets efficiently is crucial for any developer or data scientist. Python's itertools module offers a suite of fast and memory efficient tools to handle extensive data iteration, making it a great solution for managing large datasets without exhausting system resources. This talk will dive into several key functions of itertools comparing them to native Python iterations. Attendees will learn how to streamline their code and reduce memory usage while maintaining high performance.
Vibe coding is great when it works, but less so when it leads to security vulnerabilities. This beginner-friendly introduction will equip you with the knowledge to make your Python code more secure. We will explore common ways Python projects can become susceptible to issues—from dependency risks to malicious data—and provide practical, actionable recommendations on tools and strategies to safeguard your work. Leave feeling more confident and less paranoid about your code.
While Python's object-oriented paradigm dominates developer discussions, its functional capabilities offer powerful tools for modern development. This talk explores how integrating functional programming (FP) concepts can enhance code readability and maintainability in Python projects. We will discover how to transform imperative logic flow into composable functions and implement declarative error-handling using intuitive railway patterns. This talk aims to provide an optional toolbox for Python developers to improve code quality.
Ever wondered how to create your own domain-specific language tailored to your needs?
This session explores how to build a programming language that lets users write code using natural language. Designed to be fun, intuitive and easy to use, the language is crafted for young learners especially those for whom English is not a first language. Built with Python, the system parses and interprets natural language into executable code. I’ll share the design challenges, key insights and lessons learned from creating a language that’s user-centered.
Are you tired of your code working in your environment, but failing when your teammate tries to run it? Are you tired of typing commands into your bash terminal again and again just to run your test cases? Are you tired of pushing your code to a shared repo, just to find out it failed the CI checks put in place by the repo's admin?
If you've ever encountered any of these situations, you may benefit from having tox in your CI workflow. With tox, you can set up virtual environments to automatically run your format checks and test cases - all in a single file. It's a simple and elegant solution that can save you and your team time and frustration.
The rapid evolution of Python presents version selection challenges for developers. We will talk about the development process of Python versions, mainly focusing on the key changes from Python 3.10 to 3.14. This talk will highlight the most impactful features and their implications for real-world development, helping developers make informed decisions about upgrading their Python.
Nobody likes waiting around, especially when code is silently crunching away. For data scientists, engineers, and anyone running long-loop operations in Python, the endless blank console can be a source of much anxiety and frustration.
This lightning talk introduces tqdm, Python's secret weapon for banishing the black box. We'll quickly explore how this deceptively simple library transforms mundane loops into engaging, informative progress bars, providing crucial real-time feedback. You'll walk away from this talk understanding how a few simple lines of code can dramatically improve the user experience of your scripts and boost your productivity.