PyCon AU 2025
The Data & AI specialist track is all about the technology, skills, and practices that engineers, data scientists, and researchers need when building effective and reliable solutions that involve data and leverage different flavours of machine learning and automation. We’re excited to hear talks that cover a wide range of topics relating to data science, data engineering, ML engineering, and modern artificial intelligence.
Welcome to the Education track!
Python has become one of the cornerstone languages for developing scientific software, thanks to its flexibility, extensibility, ease of use and extensive ecosystem. Whether you’re doing machine learning, processing & visualising data, or running a statistical analysis, there’s a good chance there's a Python package that can help get you to a result quickly.
This track is for anyone using Python for scientific computing - be it data analysis, engineering, academic research, statistics, modelling systems, machine learning, or just generally hacking together new tools to extract insight.
How do you reproducibly identify and count individual neurons in a brain region that’s tiny, diffuse, and surrounded by lookalike regions, especially when each 3D brain image is multiple terabytes in size?
This talk explores this very question by diving into the development of a Python-based, end-to-end pipeline for analysing whole mouse brains imaged using light-sheet fluorescence microscopy. The goal is to quantify the number of individual dopaminergic neurons in the substantia nigra pars compacta (SNpc), a small but clinically significant midbrain region implicated in Parkinson’s disease.
Built entirely with open-source Python tools, the workflow combines brainreg (from the BrainGlobe ecosystem) for atlas-based registration, dask for scalable image processing, and a custom-trained Cellpose model for 3D cell segmentation. To address the complexity of region extraction and alignment uncertainty, the pipeline includes parameter sweeps, pre-processing optimisation, and quantitative evaluation using expert-labelled ground truth masks.
This talk will also highlight how the integration of multiple Python open-source packages supports scalable, reproducible neuroimaging analysis, from parallel execution on HPC clusters to image registration and deep learning-based segmentation pipelines, as well as quantitative methods for assessing alignment fidelity.
In this talk, we’ll explore four modern data engines ; Daft, DuckDB, Polars, and DataFusion, that offer varying levels of out-of-core execution. These engines allow you to work with datasets larger than memory without the need to rewrite everything for a distributed system.
They’re fast, expressive, and, above all, Pythonic—though SQL support might still be the deciding factor for many workflows.
Rather than comparing which engine is best (they’re all open source, and good ideas tend to spread quickly), this talk will highlight exciting recent developments. We’ll showcase, for example, how one workload saw a 2× performance boost in under a year, and how open table formats are bringing cloud data warehouse capabilities to local Python workflows.
When teaching computer science, many students stuggle with the wide variety of concepts. One week you're learning programming, the next you're learning cybersecurity, then you're learning about image file formats. It can be hard to see how these topics are related, or if they even are related!
Without sufficient context, these are just abstract topics, disconnected from reality and from each other, with rote memorisation the only tool at hand to help you get things right on the exam. With context, however, things start to make sense. You understand how the topics relate, how these technologies developed and evolved, and can derive solutions rather than memorising them.
This context – "why" is the technology the way it is? – fundamentally changes how students connect with the subject, but with curriculums so jam packed it can be hard to justify the time to go into this level of detail. I found, however, that when I did cover the "why", I spent less time on revision, more time on discussion, and students had better results.
I want to share with you a few examples of how I did this, what happened when I discussed those in class, and how I go about finding the "why" when I don't already know what it is. Hopefully, I can give you some new tools for your own teaching, and – critically – explain the "why" behind these tools too.
In this session we will share our experience of using Apache Airflow to build production scientific modelling workflows. This will draw on our work at the Australian Bureau of Meteorology on multiple projects which updated existing services to use Airflow – the eReefs water quantity and quality modelling and Seasonal Streamflow Forecasting services.
Why invest effort to learn and then apply the Airflow framework to manage your scientific workflows? In our years of experience, building workflows around apps for scientific analysis that are both operational-quality, and are also enjoyable and productive to work with for scientific developers, has been something of a persistent pain-point.
As scientific developers, if you roll your own workflow management system from the ground up then you retain control and can use all your favourite Python tools - but over time it can often result in a combination of scripts, cron and/or Jenkins jobs that is hard to maintain. You’ll also be short of features you need in an operational-quality system like good logging, error handling, and a pleasant monitoring web UI for non-developers (e.g. application support teams) to use. All the above is exacerbated when effective task parallelisation is a goal. On the other hand, applying off-the-shelf general business IT workflow management apps to scientific modelling use-cases can result in cumbersome systems that are difficult to update and involve a lot of duplication.
Enter Apache Airflow - an open-source workflow manager written in Python with workflow Directed Acyclic Graphs (DAGs) defined directly in Python code. We’ll give examples illustrated from our project work of updating existing systems to run in an Airflow framework with a goal to enable greater automation, scalability and quality control. These include:
- Challenges faced getting started with Airflow for our small project teams – including tips for setting up development instances of Airflow’s scheduler and workflow backends.
- Summaries of how we used the different Airflow “Operators” to invoke program code – including trade-offs between tight and loose coupling, and how this interacts with the use of Conda for managing complex scientific software stacks.
- Our experience of using Airflow’s workflow parallelisation effectively for chunking up work.
- Experience from different deployment options – both to AWS cloud containers, and locally-managed Virtual Machines.
We'll finish with reflecting on key lessons learnt, and ideas for further improvement in scientific software workflow management.
I’ve used pandas for years, but as my data grew, my local workflows started to slow down. Joins got sluggish, memory errors showed up, and simple tasks became harder to manage. That’s when I found DuckDB, a fast, in-process SQL engine that brought the speed and flexibility I was missing.
This talk isn’t about replacing pandas. It’s about knowing when to reach for something different. I’ll share how DuckDB helped streamline my workflow, with real examples, side-by-side comparisons, and a quick intro to DuckLake, a SQL-based Lakehouse format that fits naturally into modern Python analytics.
I work for a local council as an educator and mentor at The Lab - a social group for autistic teenagers with an interest in technology. Operating out of a makerspace at a public library, we allow participants free use of tools so that they can tinker, experiment, and build their skills, both social and technical. I see these students gain confidence, discover and ignite new passions, and grow their independence.
It isn't all easy, however. Providing a safe space for neurodivergent teens has its challenges. We have had to support participants through emotional instability, interpersonal clashes driven by contradicting neurodivergent sensitivities like noise and light levels, communication barriers, and more. However, with a little patience, when given the right support, we have seen participants bloom, stepping out of their comfort zone and sometimes even developing their own projects.
I'll share some reflections on what participants at The Lab do, how we structure the sessions, and the considerations we take to foster a neurodivergent-friendly learning environment - hopefully this provides you with some insights that can help you to improve your pedagogy too.
I believe that by creating an inclusive environment for neurodivergent people, we make it easier to learn for everyone. After all, accessibility for one is accessibility for all.
We’re all building AI features now – or will be soon. But working with teams who are building with LLMs brings its own challenges – namely: How can we bring in the latest research, consider AI ethics, and consider the cost of different models without blowing past delivery dates. Not to mention making sure that the features we build will be stable, reliable, and maintainable in the future.
In this talk, I’ll share a case study of how we built our first LLM feature. In 1 month, we did everything from running experiments, developing evaluation methods, assessing the risks, and considering ethical concerns to build the feature. Specifically, over this period we did a literature review, consultation with academic experts, data labelling, model experimentation, a cost assessment, and finally, all the ML engineering to launch it into production. The outcome: <1% extreme misclassification and zero hallucinations.
In this talk, we’ll share our approach to building LLM features – how we partnered with academia (without being delayed by their timelines), what tooling we used, and how we made the cost and money tradeoffs to keep business stakeholders happy. As one example, we’ll share how important evaluation data was for building our features, because it helped us improve our definitions and revealed gender differences in how people perceive feedback. We’ll share the principles we used when balancing rigorous, robust practices with cost and timeline considerations. Finally, we’ll share which frameworks actually helped us make the right calls, avoid expensive do-overs, and navigate the AI ethics side as well.
You'll walk away with hands-on tools for leading the AI conversation within your own organization – including how to identify ethical issues early, address them efficiently, and still deliver on time and on budget.
Object-oriented programming can feel impossibly abstract to students, and demoralising to the teacher facing a class of blank stares, until you bring in Pikachu. This presentation shows how Pokemon naturally demonstrates every OOP concept we might struggle to teach, from basic classes to inheritance.
Instead of the standard "Car" and "Animal" examples, students work through boiler plate code to predict, debug and eventually create classes and objects, to design and build battle systems.
You'll see student work samples, and walk away with some different ideas on approaching teaching with OOP.
Water quality modeling plays a crucial role in managing aquatic ecosystems. Those models are based on Computational Fluid Dynamics (CFD), the science of using numerical analysis and data structures to solve many processes influencing fluid behavior. CFD requires many complex mathematical representations through various ODE and DPE solvers, and has traditionally been written in Fortran or C++, mostly for their speed in doing massively parallel computations, such as handling array operations and object-oriented features. As such, those codebases have a long-standing legacy in scientific computing, and many established CFD codes are written in them. However, as time goes on, we witness the popularity of a new generation of codebases such as Python, which stems from its simplicity, versatility, and strong community. Extensive libraries and frameworks make Python a popular choice for many developers and scientists alike. It is, however, not a preferred language for core CFD code due to performance limitations and its difficulties with parallelization.
This talk explores modernizing a 25+ year-old C++ CFD model, essential for simulating lake conditions, particularly for predicting temperature variations and algal bloom dynamics, by wrapping it with C-interop for seamless Python interoperability.
Beyond integration, we leverage Optuna, a powerful hyperparameter optimization framework, to fine-tune models efficiently, transitioning from manual parameter tuning on a laptop to a distributed, scalable workflow powered by Dask Distributed and JupyterHub. This transformation enables automated hyperparameter optimization across many lakes in Australia, helping researchers investigate trends in tuning parameters and derive deeper environmental insights.
After a number of my Year 10 students entered a capture the flag cybersecurity competition they had a sudden burst of enthusiasm for the topic. Networking and cybersecurity, being parts of the national curriculum, are things that I should probably be teaching anyway so when it came to delivering this unit I figured I'd capture their enthusiam by running my assessment as a capture the flag. This seemed like a good idea at the time, but as it turns out CTFs take quite a bit of effort - and getting a system like the micro:bit to reliably immitate a network a bit of work. Also creating a scenario where students don't just troll each other in the middle of an exam when... also figuring out how to stop them cheating when their communication is open... also figuring out a way to teach them everything they needed to know in order to even approach the task... also how do I learn all this stuff first?
Join me as I go through the five weeks I spent learning, delivering and then assessing cybersecruity things.
I'll be overviewing what I had to teach to get them ready, what I had to code on the micro:bits to make any of this work, and ultimately ask: how effective is it to run a capture the flag using Micro:bits.
What do you do when you’ve captured millions of aerial images across thousands of kilometers and want to find the handful of problems that could take out a state’s power supply? This talk explores how we built a full-stack system — combining machine learning, Django, and self hosted infrastructure — to identify critical anomalies in Australia’s electricity transmission network.
You’ll learn how we perform image processing at scale, reducing an overwhelming data problem into a handful of insights that matter and how we present these results to users.
At synchrotron facilities, researchers use high-energy X-rays to uncover the structure and chemistry of materials — from batteries to biological systems. These experiments generate vast amounts of complex, high-dimensional data, and scientists need flexible tools, interactive exploration, and smart automation to make sense of it all. In this talk, we present a Python-based data processing suite designed to support X-ray spectroscopy workflows: from generating immediate data products to inform critical decisions during limited beamtime, to in-depth reprocessing off-site — or even offline.
The suite is built around a processing library that uses xarray
for intuitive multi-dimensional data handling and provides reproducible, transparent analysis pipelines via command-line tools and Jupyter notebooks for coding-affine users. It also includes a PyQt
-based desktop GUI that enables domain scientists to interactively explore their data and fine-tune processing steps without the need for prior programming experience.
We'll explore how Python libraries like xarray
, pyqtgraph
, and typer
take the processing of raw experimental data from low-level wrangling in multiple dimensions to high-level exploration in intuitive interfaces. By providing different levels of balance between automation and user control we enable scientists with all levels of programming experience to create insightful datasets — whether they're at the beamline, back at their home institution, or already on the plane to the next conference.
When academic engineering tools remain trapped in Matlab while the industry relies on Excel, Python offers a unique opportunity to bridge this computational divide. While academic software packages for the efficient design of light-gauge steel have been available for several years, they effectively require highly specialised expertise to use and understand—challenging the material’s adoption beyond highly specialised buildings. This talk presents a case study of transforming CUFSM, a trusted but highly technical Matlab package for the finite strip analysis of light-gauge steel, into pyCUFSM—a performant and open-source Python implementation deployable to AWS Lambda.
The migration revealed and addressed several key scientific Python challenges. Moving from Matlab’s matrix-oriented syntax to idiomatic Python required not only converting every index from 1-based to 0-based, but also careful architectural decisions around NumPy array handling and SciPy linear algebra operations. Performance bottlenecks were addressed through strategic use of Cython compilation and aggressive pre-allocation and reuse of data structures, achieving a p90 duration of 3.9 seconds for real-world usage.
However, technical performance alone wasn’t sufficient. In regulated industries where engineers face personal liability for design failures, trust and transparency are paramount. Not only must the interpretation of inputs and outputs not leave any doubt, but the calculations themselves must be validatable by the user - without having to read or understand code. By coupling this Python analysis package with a React-based frontend, a calculator was developed which requires only 4 inputs, shows all intermediate steps with full rendered equations, and still allows access to advanced parameters for those who want it. Extensively validated, such an implementation is showing significant gains in adoption by the wider structural engineering community. The final system demonstrates how Python's strengths—from scientific libraries to cloud deployment—can bring academic innovation into wider use in regulated industries where reliability, ease of use, and user trust are essential.
Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science. Recent studies suggest that networks often exhibit hierarchical organization, where vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. Here we present a general technique for inferring hierarchical structure from network data and demonstrate that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks(Eg Terrorist Networks), such as right-skewed degree distributions, high clustering coefficients, and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partially known networks with high accuracy, and for more general network structures than competing techniques. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena
Shock horror! Banana, the beloved science class Python has been released! But how? And more importantly... who did it? The clues are hidden in a collection of spreadsheets, but we’ll need something a little more powerful than VLOOKUP to solve this slippery case.
Using Python in Excel (or Jupyter notebooks if you prefer!) will help us close the book on the tale of the missing snake while sneaking in real-world coding, logic, and data wrangling. Learn about the mystery I’ve crafted and how you can get your own students coding with this (or your own) data adventure!
Frustrated at the inevitable paralysis of "which GUI library will my students hate the least", excited about the possibilities of a new toy in Streamlit and drunk on the power of having recently learned about Github Codespaces, our intrepid educator ventured out on a journey of discovery and (mis)adventure. Would interactive data dashboards be the key to unlocking engagement? Would students ever learn to commit their work? Would the tiniest little technical snag derail entire lessons?
Time series data is everywhere. Across industries such as environmental monitoring, financial market analysis, power and energy systems, and scientific discovery, organisations rely on analysing large volumes of complex time series data to make smart and informed decisions that help keep the world running smoothly. Two of the most critical tasks in time series analysis are Time Series Classification and Time Series Forecasting. Python’s data science ecosystem for time series analysis has grown significantly in recent years. In this talk, we will introduce the modern landscape of time series tools available in Python. We will demonstrate the usability, algorithmic diversity, and interface design of libraries such as Sktime, Aeon, and Nixtla (NeuralForecast, MLForecast). These libraries will serve as examples to show how easy they are to use, what kinds of algorithms they provide, and how their application programming interfaces are structured to support efficient and intuitive development. Whether your goal is to classify environmental patterns or forecast future trends, these tools can simplify and accelerate your time series analysis workflow.
AI can draft product descriptions, handle customer support, and create impressive summaries, yet in production we still struggle to answer a basic question: is this output measurably any good? Traditional accuracy metrics fall short when your LLM needs to write engaging content or your image model is expected to produce aesthetic results. And checking outputs manually? That doesn't scale.
This talk is a friendly introduction to building evaluation loops that work for GenAI models. We'll explore everyday examples such as grading LLM summaries and judging whether chatbot responses help or frustrate, showing why deterministic metrics fail for open-ended outputs. From there, we outline a three-part approach combining simple metrics for quick first-pass evaluation, human-preference samples for nuance, and repeatable tests that run with every model change.
To ground these ideas, we'll walk through a real project that turns elevation data into high-quality Swiss-style relief maps. The domain may seem niche, but the lessons learned of balancing automation with human judgment, tracking non-deterministic outputs, and iterating quickly without drowning in data, apply to every project. You’ll leave with a mental checklist and a starter toolkit for proving that your GenAI output is getting better, not just different.
Follow my journey from drowning in the world of data structures to understanding through implementation. Skip Lists... What are they? Why are they? And how do they work? These are all questions I sought to answer after I discovered them in the wild. Come and find out why I think writing your data structures from scratch might be the best way to understand them.
Clinical data harmonisation efforts are an extraordinarily powerful tool in the world of observational research. When your data model is designed to do everything, however, there is a necessary trade-off in design principles. The requirement to support every possible use-case across all clinical domains means that they can tend to favour flexibility over clarity, storing events, measurements, treatments, and outcomes in highly normalised, loosely typed schemas. For domain experts like oncology researchers or clinicians, this makes even basic questions (say, “what happened to this patient, when, and why?”) frustratingly opaque.
Using Python’s ORM paradigm, we created a more intuitive, opinionated view of oncology data. By surfacing richly connected objects like CancerPatient, CancerDiagnosis, Regimen, or Cycle, we move away from brittle SQL scripts and toward a model that reflects how clinical experts already think. These ORM-backed tools not only support reproducible ETL and visualisation workflows, but also allow non-developers to explore complex patient journeys in a hands-on, object-based way. We’re building out a library of reusable object maps that encode domain knowledge directly, letting researchers focus on clinical questions and not worry about the nuanced query logic.
This presentation will introduce the audience to the fields of electricity market modelling and mathematical optimisation (MO).The talk will be comprised of three sections:
The first section will provide some background and context for the audience. We will describe what an electricity market is, how it functions, and the energy transition which is changing how we build and operate the grid. We will also provide some background on mathematical optimisation - how it works, and why it is the most common way to model energy markets.
In the second section, we will walk through how you build an optimisation model together. We will describe an extremely simple electrical grid, and show we can simulate the dispatch of power stations through a mathematical optimisation model. We will also explore some of the benefits of these models, including how electricity prices are generated by extracting the marginal value of constraints.
Finally, we will break down the core components of a mathematical optimisation model and discuss how this approach can be used to solve quantitative problems in many other sectors.
After over a decade of teaching application development, I became a teacher in a local college a few years ago. Over 12 months, I taught open-source programming languages, including Python, alongside frameworks and content management systems, to various student cohorts. This session will share takeaways from that experience.
The goal of the presentation is to explore the current link between the open-source community and academia, examine certain educational institution stigmas, and improve how we, as a community, present Python to teachers and students.
Free form data entry is a realm that is simply begging for data that you’ll never be able to crossmatch. Whether it’s typos, using obscure initialisations, or deciding that “name” sometimes means “full name” and sometimes means “nickname of the day” your data is never going to be a perfect match. And let me tell you, those problems are multiplied if some of that data is entered by children!
I’ll take you through the techniques from edit distance to generative AI (and combinations thereof) I have used to match up proper nouns ranging from people’s names to schools and see how some techniques work for people and not for places, and vice versa. I’ll show you how it is working in practice to help track event participants over the course of a single workshop day and ultimately over the course of their journey through our community.
Models based on physical processes and equations of motion have been built, over decades, to model the physical structure and composition of the ocean, atmosphere, and land surface. Other scientific models have been built to process very complex, high-dimensional data from instruments such as satellites and radar into simpler data products such as images and surface field estimates. These models are often computationally intensive.
Prior to around 2022, the complexity of many of these tasks was seen as infeasible for machine learning systems. In the years since, deep neural networks have achieved state-of-the-art performance across the types of models described above.
PyEarthTools is an open-source, Python software framework that supports the development of machine learning models, big and small, for Earth system science. See https://pyearthtools.readthedocs.io/en/latest/ .
PyEarthTools contains modules for:
- Loading and indexing Earth system data into Xarray;
- Processing and normalising Earth system data for machine learning;
- Defining machine learning (ML) models;
- Training ML models and managing experiments; and
- Evaluating ML models.
Come to this talk to learn how to:
- Get started with machine learning for Earth system science
- Learn about the PyEarthTools framework and how to build re-usable ML pipelines
- Catch the science bug and try out some simple projects
- Train your own weather models from scratch
This talk is suitable for beginners through to professional scientists and data scientists.
PyEarthTools was initially developed by the Bureau of Meteorology (Australia), and now also has developers from the National Institute of Water and Atmospheric Research (New Zealand), and the Met Office (United Kingdom).
Detecting misconfigured permissions and sensitive data leaks across enterprise knowledge base platforms like SharePoint, Confluence and Google Drive demands analysing massive volumes of document metadata - permissions, user identities, groups, and organisational structures. This challenge rapidly becomes overwhelming at scale due to the terabytes of data involved.
In this talk, I'll share our practical experience building a data pipeline in Python to tackle anomaly detection at scale using Dask, a flexible, open-source Python library that enables parallel computing and scales data processing workloads seamlessly from a single machine to distributed clusters.
Our pipeline ingests raw metadata from knowledge base APIs, transforms it into canonical CSV datasets, and applies a combination of unsupervised machine learning techniques (e.g. NMF/SVD), rule-based logic, and sensitive-data-pattern detection. Results are translated back into structured formats (jsonl), to publish actionable security alerts for our clients.
Dask’s familiar pythonic API let us scale beyond single-node limits and handle large, I/O-heavy computations. We'll highlight some key Dask concepts we relied on.
However, scaling up wasn’t without challenges: we faced numerous performance bottlenecks and memory management issues, including out-of-memory (OOM) errors and task graph stalls.
I’ll walk through the key strategies we developed to overcome these challenges, including custom loaders to better manage memory usage and mitigate issues caused by handling large volumes of files. We’ll cover how we used strategic repartitioning to rebalance workloads, and selectively persisted intermediate results to avoid redundant computation. Finally, we’ll explore how the Dask dashboard helped us pinpoint bottlenecks and debug stalled graphs in production.
Attendees will leave with practical insights and effective debugging techniques they can apply to scale their own data workloads with Dask.
Showcase of student projects
This talk explores how the social sector can leverage responsible GenAI and agentic AI to expand equity, inclusion, and impact—while staying true to human dignity and ethical principles.
WebHorus and WebWenet are entirely browser based implementations of the Horus and Wenet radio demodulators which are commonly used for receiving amateur high altitude balloon telemetry and imagery. By producing web versions of these tools, it has significantly lowered the barrier of entry to set up a receiver. It's even possible to receive on a mobile phone connected to a USB software defined radio!
We'll be running through how we compiled the C applications into Python modules, built the Python modules for WebAssembly and finally how they are called from JavaScript. No server involved. Packaged into a progressive web app that even works offline.
The Data & AI specialist track is all about the technology, skills, and practices that engineers, data scientists, and researchers need when building effective and reliable solutions that involve data and leverage different flavours of machine learning and automation. We’re excited to hear talks that cover a wide range of topics relating to data science, data engineering, ML engineering, and modern artificial intelligence.
Thanks for coming to the Education track!
Thanks for coming to the Scientific Python track!
Everything you need to know if you've never been to a PyCon AU before.
Welcome to Saturday!
An amazing keynote!
EVs are transforming transport, while Australia has unique resilience requirements for our beloved road trips. Could I build an app with data & AI to help more Aussies electrify their rides?
Come on a journey through operations research, spatial data, graphs and more, prototyping with python, streamlit and a host of useful libraries and open data services. Follow the winding road of development as we encounter and resolve challenges.
We'll also explore the applicability of this approach to related problems in EV network and fleet planning. I hope you'll leave full of charge for your next data-driven adventure.
You, by which I mean all the Eukaryotes in the audience, are running on an operating system a couple of billion years old. Every cell in your body is full of primordial libraries, monkey patches, self-modifying code, viral hacks and even containers embedding a different operating system. This talk is all about the freakish parallels between cell biology and computer architecture, and will leave you less confident than ever that your seemingly complex life is anything more than a molecule's way of making more molecules.
Details on this talk will be announced soon
There is a lot of hype swirling around about AI coding. But there are many sceptical developers who have tried AI coding tools and found them rubbish.
This talk aims to move beyond the hype and focus on AI coding best practices as technology stands today. These tips are applicable for developers who write code everyday and people who haven’t written a single line of code. Don't expect to become a 10x developer overnight. But with the right habits, you might get 10% more productive and a lot less frustrated.
Note this talk will focus on using CLI-based coding agents (e.g. Claude Code), rather than AI auto-completers in IDEs (e.g. GitHub Copilot, Cursor).
Python is slow for certain tasks. That's not news. The usual response is to either accept the performance hit or rewrite everything in another language. There's a third option: use Python as the control centre and delegate specific tasks to languages that excel at them.
This talk provides practical patterns for building polyglot applications. I'll show JavaScript handling real-time web interfaces, Rust rendering data visualisations, and C++ accelerating numerical computations—all orchestrated by Python. Through different short live demos and visual explanations, you'll learn when and how to integrate other languages without sacrificing Python's strengths, resulting in applications that are both fast and maintainable.
This session will be announced shortly.
Cubescape is a tiny company based in Melbourne that designs, builds, and operates Escape Rooms. Our rooms are full of technology and robotics, and run a mini-operating system across a network of Raspberry Pis all running python code with a heavy dose of asyncio. Starting with Python 3.3 and Debian 8 back in 2014 we've kept our oldest games running for ten years and will be launching our latest game this year with the same codebase, compatibly migrated to 3.11 and Debian 12. This talk will dive into our codebase to show how and why we did things the way we did, and some lessons learned from keeping a commercial project alive this long.
One of the big features added in Python 3.13 is Tier 3 support for iOS and Android as platforms. This means you can now run Python code that uses the standard library on mobile devices without modification. However, most interesting projects also use code from PyPI... so how do you get that code to run on iOS and Android? How do you even produce a wheel in the first place?
This session will be announced shortly.
Python – like many languages – lets you do things that are completely inadvisable. Many of the features that (left unchecked) allow you to do inadvisable things were used to achieve things that have since become necessary and defining features of Python.
Python – unlike many languages – discovered that leaving these obvious and necessary features lying around next to inadvisable things was a bad idea, and built guardrails around them.
In real life, guardrails are structures that make it easier to understand how to be safe in an area where there is otherwise danger. If respected, guardrails make you safer, but unlike walls or fences, guardrails do not block you from danger.
In Python, features like decorators, context managers, async functions, importlib, and more are all guardrails that let you work with less-safe Python machinery from a much safer distance.
In this talk, we’re going to explore the idea of guardrails as a design philosophy, and use that to explain Python’s attitude to safely working with the language and its internals.
We’ll explore features of Python that are guardrails around less-safe features – what features they replaced, how those features could be used incorrectly, and how the newer features allow you to use Python more safely. As a special treat, you may also get to see how Python lets you abuse these features*.
We’ll conclude with a discussion of how you can use Python’s guardrail philosophy in your own code.
(* SKILLED OPERATOR ON CLOSED CIRCUIT; DO NOT ATTEMPT)
If there's anything that AI is known for, it's not speed. Other than throw more GPU's at the problem, what can you do to make AI models run faster? In this talk, you'll learn about AI model performance, what impacts it and what you can do to improve it.
When you're running a web service used by customers or running a daily data pipeline, you need a way to know if it's working. One option is to wait until users start flooding your support system or someone idly comments that the data seems old and you realise your data pipeline hasn't worked for a week. Another option is to have your code tell some central system whether things are working or not and then make pretty graphs out of it, and potentially wake someone up if things are bad enough. Having the machines automatically tell you if things are wrong rather than waiting for the humans is generally preferable.
This talk will be a beginner to intermediate introduction to observability and to OpenTelemetry, a vendor-agnostic collection of APIs, SDKs, and tools for observability. I will answer questions like:
- What do you mean by telemetry? What are metrics, logs and traces and when would I use each?
- What's OpenTelemetry? What does it give you out of the box? How do all the pieces fit together?
- Okay, I want to spin this thing up, what do I need to know?
I will also cover some of the business goals that can be achieved by using OpenTelemetry, eg standardised tagging, scrubbing PII and cost attribution.
Kraken Technologies ships over 100 versions of code per day across 25 environments in more than 10 countries with new versions being kicked off to deploy upon merge into master.
We want to be able to run jobs across all of our various environments without slowing down developers (by blocking deployments) or having an impact on clients during their operating hours (outages or performance degradation).
But how do you run jobs out of hours when there are no time zones that are out of hours for everyone? In this talk, I will introduce the Housekeeping Framework, our internal solution to this problem.
When you're learning to code, no matter what the language, you learn small components: write a function to do something, create a class to do other things, add a user interface component and so on.
Then you get out into the real world and there's nobody telling you what components to write. You know how to code, but you don't know how to develop. So you do more courses and tutorials, watch more videos and read more documentation. But you still don't know how to take that extra step and apply your knowledge to develop an entire project. You're stuck in tutorial hell!
I'm going to go beyond the code and show you how to design, structure, and document your projects. I'll also show you how you can use AI to help.
These problems and their solutions will apply to every language and development environment you'll use through your career, whether or not you're using Python. By the end, I hope I will have given you some tools to help you escape from tutorial hell and start developing your own projects.
Python has match statements now. What are they? Why are they? How do they work? Discovering the true power of Python's structural pattern matching has made my classes more powerful than ever! Want to learn this power? Then come to my talk! It promises to inform, engage and entertain with real world worked examples, arcane facts, and interactive questions.
Have you ever thought about adding a feature to the Python language itself?
If you have, or just want to learn a bit about how Python works under the hood, we will step through the story of adding a small feature to a fork of Python, learning about compilers and interpreters along the way!
A system that applies machine learning to make UI tests more resilient by automatically detecting and fixing broken selectors, reducing rework and increasing test stability.
When something has gone wrong in your neighbourhood, and they're calling you... whatcha gonna do?
Like regular talks, but shorter! Anything could happen!
Thank you for coming to Saturday!
Welcome to Sunday!
Designing for inclusion isn’t about ticking boxes, avoiding lawsuits, or expanding markets—it’s about people. True inclusion starts with the individuals we meet and the stories we choose to share.
My journey into digital accessibility has been deeply personal—shaped by my Dad, who was blind, navigating a world that too often overlooks people like him.
As the designers and builders of today’s digital society, we have the power—and responsibility—to make everyone feel welcome. This talk explores how storytelling and small, intentional design decisions can spark more inclusive experiences, both at work and in the world around us.
"In the beginning, the Universe was created. This has made many people very angry and has been widely regarded as a bad move." - Douglas Adams
Computers, despite a frankly alarming amount of hype, are not good at nuance. They are usually fast, occasionally obedient, and capable of storing quite a lot of cat pictures, but when asked to represent something as slippery as "a person" or "Tuesday", things begin to get weird.
Human beings remain the most reliable mediums for intepreting reality in a format that can machines can parse. Regrettably, reality is complex. We make heavy use of abstractions to handle this complexity, but if you ask any seasoned developer about designing systems to handle names, time, geography, or a thousand other "standards", you will be met with either a thousand-yard stare or a wild, keening noise. If no group chat can agree on whether cereal is a soup, how can we tell a computer what to do?
Apart from the difficulty of agreeing on categories, reality's refusal to be abstracted neatly can lead to system inaccuracies, poor user experiences, security vulnerabilities, and the amplification of social harms. But given our industry, our systems of government, and (quite often) our sense of self are built on top of the very same kind of abstractions, how can we do better in the systems we are responsible for?
In this talk, we will look at some of the most common ways that our systems and data models frequently do not match reality, explore approaches to handling reality gracefully, and consider how to anticipate flaws in those models and minimise harmful outcomes. This will be an introduction to the topic for some, a refresher for others, and possibly a useful thing to show that one boss with unrealistic expectations.
Attendees will leave with a better understanding of how and when to make effective use of abstractions in systems, and probably an existential headache.
Every software system can benefit from being "faster".
But what does “faster” mean, exactly? How do we get to “faster”? If everyone is trying to build performant systems, then why are so many of them so slow?
Let's talk about the fundamentals of performance engineering, and how to use them design and build systems that feel "faster" - even when that's not quite what you thought.
This is an overview of the tips and tricks I learned while bringing a mathematical optimization model from concept to prototype to production.
At its core, the math model is a multi-dimensional knapsack problem, with tens of thousands of items moving between warehouses with up to a hundred user-defined constraints. The model is solved with Google's OR-Tools and is deployed to AWS Lambda.
Some of the important ideas I want to share are how to build a test suite to ensure that a model of this type is internally consistent and how to handle errors-as-data for your end users. I also learned a really nice way to configure logging in python.
This session highlights how AI tools like GitHub Copilot are helping bridge gaps in technical confidence and access, especially for underrepresented developers. We’ll share stories from the launch of the Code; Without Barriers program in Australia, and show how Python and AI can empower new coders to build confidently and creatively.
It's impossible to do software without the power of fonts and text rendering. They underpin the terminal we use to run our code, the browsers we use to run our web apps and the emails we get paid too much to write.
But for many they are a black box of standards and code that seems to effortlessly click together, until they don't. This talk explores how this works from font file structures, internationalization issues and font stack management. Finally we use the knowledge we've gained together to poke at some non standard use cases for fonts which pull back the curtain to show a font file for what it is, arbitrary instructions on how to render a shape on the screen. Cats are involved.
Sometimes you inherit a clunky pipeline. Sometimes an LLM writes one for you. Either way, you’re stuck with something slow, memory-hungry, and hard to scale.
This talk is about what happens next — how to turn a naive tabular data pipeline into something fast, efficient, and scalable. You’ll get a guided tour through a zoo of optimization techniques: reducing algorithmic complexity, minimizing memory usage, improving I/O throughput, and swapping in Polars — a fast, Rust-based DataFrame library — in place of Pandas (for reasons beyond just hype). By walking through a real-world example step by step, you’ll see how each change makes an impact — and come away with a sharper eye for spotting similar bottlenecks or inefficiencies in your own pipelines.
The walkthrough is grounded in a real-world ML feature engineering task from the aviation industry. But in the spirit of spring, we’ll swap baggage belts for bird feeders — and reframe the problem through a birdwatcher’s lens, not by tracking airport operations, but by counting sparrows and mynas visiting my backyard feeder.
The Model Context Protocol (MCP) is an open standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. This talk explores practical implementation patterns for building robust MCP servers in Python, covering architecture decisions, error handling strategies, and real-world deployment considerations. You'll learn to move beyond basic examples to production-grade implementations that can reliably serve AI applications at scale.
Providing the right context to large language models is challenging. Every integration—whether accessing local files, connecting to Google Calendar, or querying internal databases—requires custom work. Each tool connection becomes a separate, manual effort. This is where MCP comes in. Model Context Protocol (MCP), in simple terms, is a universal adapter for AI models that need external context. It standardizes how context is passed to models, making it easier to build and manage these connections.
My talk will not only focus on how to get started with MCP, but I will also outline how to build MCP Agents using your own MCP Server tools and the MCP Client UI with Streamlit. At the end of my talk, we will also discuss the security issues associated with MCP and how one can mitigrate those.
With the arrival of large language models and the generative tools built on them we have seen a huge increase in pressure to use these tools in our work.
This worries me, but it's hard to easily explain why.
Am I being paranoid? Or just overly opposed to a change that's inevitable? Is it just... a skill issue?
I invite you to listen to me dig through the history of the relationship between automation, technology, and work to try to unpack my feelings on where our industry is headed.
How do you lead digital transformation when your starting point is paper, pens, and a patchwork of spreadsheets? In this session, I share how business users helped drive real change at one of Australia’s largest food and hospitality service providers—progressing from manual paperwork to a cloud-native platform powered by Python, Django, Docker, and Kubernetes.
In this presentation, we will demonstrate how we use Python as a bridge to facilitate a live music concert. Using computer vision techniques with libraries such as OpenCV and MediaPipe, we detect various animal plushies – including koalas, whales, otters, octopuses, and Blåhaj – each mapped to their own unique sonic identities.
The position and movement of these plushies in space are tracked to control sound parameters such as panning, pitch, reverb level, and tempo modulation, thereby creating a playful, gesture-based system for musical expression, blending tangible interaction with digital sound synthesis.
We will walk through how we’ve applied Python for both networking (via OSC) and computer vision, and share why Python’s readability and strong ecosystem made it the perfect fit for fast prototyping and real-time control. We’ll also touch on our creative process, including mapping physical toys to digital instruments, and ensuring smooth, latency-free performance.
The presentation will close with a five-minute interactive concert, where the audience will experience how plushie placement and movement generate evolving soundscapes in real time. We hope to inspire others to explore Python as a tool not just for logic and data, but also for creative expression.
I found some vulnerabilities in Python's standard library, and now you've all had to upgrade your Python. Sorry, not sorry.
My day job is focused on open source and software supply chain security. This has made me curious - how trustworthy even are the core technologies our ecosystems are built on - like 46 year old archiving formats?
So after I read a vulnerability report that exploited symlinks in TAR files, I wondered whether Python suffered the same problem. I started poking around and ended up finding an arbitrary write path traversal in Python's standard library.
This talk will provide a detailed look at this vulnerability and demonstrate how it can be exploited by an attacker to compromise an exposed system.
I’ll also discuss how these vulnerabilities demonstrate key security challenges facing developers while building their projects. The challenges range from the different incentives between libraries and their applications, the limits of abstractions, and the difficulties of hardening legacy code.
With movement towards more regulation, like the EU's Cyber Resilience Act, and more interest in improving software security, appreciating these security challenges can help developers focus more on building exciting projects than mitigating vulnerabilities.
This talk will go into detail about how Python is used as the glue to transfer data between applications, and by extension, between artists; and how we use it to allow artists to spend more time creating and less time worrying about technology and associated problems. This talk requires virtually no technical understanding of Python, and will present only a small amount of code -- it is intended for people who are interested in receiving a glimpse "behind the curtain" to the technology used in the production of CG and VFX in content of all shapes and sizes, and seeing how Python is a core part of that. Attendees will leave with, not only an understanding of how Python is utilised in the Visual Effects industry, but with a better understanding of how Visual Effects and CG is produced in general.
Number 5 will shock you!
Forget what you think you know about robust dependency graphs, the security gains of living at Head, and those supposedly solid requirements.txt. We'll get down to the nitty-gritty of open source security, giving you real-world large-scale insights to understanding common misconceptions across programming ecosystems.
While it’s true that there is only one dependency graph (for you) (*right now) it’s not always understood what impact this can have at an ecosystem level.
We’ve got ecosystem level stats on just how many PURLs map to multiple different packages, dependency graph shifts that happen faster than you can type git commit, and some surprises with Git (im)mutability!
We will talk about vulnerabilities in your transitive dependencies, understanding what even ARE your dependencies, and trying to identify what that one (*for certain values of one) CVE you were supposedly affected by actually is. (Not to mention what, if anything, you can do about it.)
You’ll leave this talk with a better understanding of open source edge cases and just how common they are. You’ll be shocked, amazed, horrified, and hopefully a little optimistic about the state of open source security and your place within it.
This talk is about a project currently under embargo. Come back soon to find out more!
Have you ever wondered if life would be better with a robot scorpion by your side? Now’s your chance to find out! Meet Pinchy — a Raspberry Pi-powered, fully autonomous, and undeniably eye-catching robot companion.
In this talk, you’ll learn how Pinchy maps its environment and navigates with ease using SLAM algorithms and computer vision. We’ll also dive into the efficient design inspired by real scorpions, blending biology with engineering.
Whether you're into robotics, Python, or just want to see a cool scorpion skitter across the stage, come say hi to Pinchy this PyCon!
The cyber threat landscape is vast, deep and ever-changing. Short of retraining as cybersecurity professionals, How can we, as python developers do our part to help keep ourselves, our customers and our data safe? In this talk, we'll look at the current threat landscape, the ways developers commonly fall short, and just how simple it can be to drastically reduce the "oops factor" of our Python development lifecycle.
What happens when you combine quantum computing, neuroscience, and reinforcement learning—all in Python? This talk explores how parameterized quantum circuits can be used as “brains” for agents inspired by the nematode C. elegans, and how these agents learn to navigate their world using reinforcement learning. We’ll see how quantum and classical approaches compare, and what this means for the future of AI and quantum machine learning.
Like regular talks, but shorter! Anything could happen!
Thank you for coming to PyCon AU 2025!