PyCon Lithuania 2025
A
The talk has two parts, show and tell what happened or happening in our world by the use of technology, the same technology we, the people in this room create. And also to tell you all that together we have the power to change this. We can show solidarity to other human beings and groups and fight back to evil and morally wrong technology decisions.
Architecture as Code (AaC) was born for prototyping a new system architecture design without any design tools. Available tools currently support on-premise and main major providers including AWS, Azure, and GCP cloud platforms.
Code review is central part of everyday developer job. The motivation to create this talk was a quote:
“The most important superpower of a developer is complaining about everybody else's code”. In this talk I’ll explain my approach to better code review.
Python has emerged as a versatile tool for 3D computer graphics, offering powerful capabilities in modeling, animation, and simulation. This presentation explores the application of Python in creating dynamic and visually engaging 3D graphics using Blender. The session will showcase practical examples that demonstrate Python's potential in various aspects of 3D graphics
Let us build on examples from NumPy, pandas, and Matplotlib to explore techniques and tools with the Sphinx documentation generator. Learn how to implement styles, include advanced elements, and overcome challenges in creating clear, maintainable docs. 📑✨
Explore the transformative journey of Python's Global Interpreter Lock (GIL). Delve into the GIL's origins, its role in Python's growth, and its challenges for multicore processing in development. Let's discover the implications of its experimental removal in Python 3.13.1, and how this shift might redefine concurrency, performance, and the future landscape of Python applications across various domains
Struggling with slow I/O in your Django apps? Want to maximize server resources? This talk explores asynchronous Python and its impact on Django.
Let's clarify parallel vs. concurrent programming, and demystify Python's concurrency model, focusing on coroutines and the event loop. Learn how asyncio enables efficient, non-blocking code, handling concurrent requests without thread/process overhead and how everything is integrated into the Django framework.
Can you imagine a python project without any return?
Is it overhead or memory saving? Is it complicated or would it reduce complexity? Is it testable or a horror for unittesting?
"Trusted publishing" is the term for using the OpenID Connect (OIDC) standard in the Python Ecosystem to release on PyPI. In this talk will go though the usage of trusted publishing in any Python project and how it helped Ansible project to open up release management to the community. This talk is a deep dive explanation of release practicalities of releasing Ansible using trusted publishing.
The batch mechanism is challenging when handling continuous data migration with DataProc. However, I'm introducing a new approach for continuous data pipelines enabled by PySpark. The participants will learn new methods to handle data consistency and reserve data completeness in a million-scale migration from SQL database into NoSQL, MongoDB.
The Creator Developer Economy offers developers the chance to turn their skills into a passive income stream. In this talk, I’ll explore how developers can leverage Apify's tools to build, deploy, and monetize web scraping solutions. From using Crawlee for Python to create efficient scrapers to publishing and earning.
The new version of Django has several important features that allow us to avoid installing additional modules. Libraries like DRF, yasg, spectacular have always been recommended for REST API development. Now the rules have changed.
Business specifications are often vague or incomplete, making development challenging. Acceptance Test-Driven Development (ATDD) bridges this gap with clear, executable specifications. This talk explores how Robot Framework enhances collaboration between business and development teams. Through practical examples, we’ll show how to write effective tests and extend Robot Framework with custom Python libraries. Gain insights and tools to improve communication, development, and software delivery.
If you work with web services, you’re probably using containers… and you’re also probably not doing it as well as you could. In this talk, we’ll go over best practices for container images to produce lightweight, safe and modular containers for quick and efficient builds.
Programming is for a large about removing repetition and finding abstractions that achieve that. But is that always sensible? Using the power of music we will examine how universal DRY really is.
Discover how Protobuf can transform your REST API's schema evolution, while offering performance gains over JSON. This session covers Protobuf's strong versioning, ensuring seamless API updates without breaking clients. We'll tackle the challenges we faced at KAYAK, like the learning curve and integration complexity, offering strategies to address them. Gain practical insights and benchmarks as we discuss integrating Protobuf with Python frameworks, boosting your API's efficiency and adaptability.
Coding aesthetics, in this context, refers to how code is written. It is essential that programmers also pay attention to the aesthetics and not just the functionality the code aims to achieve. This talk explores several ways to make Python code aesthetically pleasing, such as code refactoring, using static code analysis tools like PyLint to check compliance with PEP8 guidelines, and applying syntactic sugar. In addition, we will discuss the limitations of PEP8 and how we can make more pragmatic choices.
After creating a great web app using Python such as with flask, the next hurdle to production is how to make it available to users and operate it. And not just your app, but also ingress, the database, observability and the list goes on. We will go through your options for simplifying the operations of your web app using open source tooling. This will include using k8s directly with helm charts, PaaS using fly.io and new tooling developed by Canonical using juju. By the end of the talk you will have seen th
Virtual environments are a fundamental part of Python development, but to most developers, they’re largely a ‘black box’. In this talk, we’re gonna dissect the code, file structure and utilities that make them up to deeply learn, and not just have superficial knowledge of, how venvs actually work.
The modern Python programmer spends little time thinking about the classic ‘Design Patterns’ from the 1990s. Why are they no longer relevant? This keynote address will explore how we write Python code today, and how it avoids the problems that design patterns were meant to solve.
A
.
Machine learning models are never truly “done.” As data evolves, so should the models that rely on it. But how can we ensure continuous improvement without costly retraining or manual intervention? In this talk, we introduce an automated pipeline designed to incrementally enhance model performance by systematically testing and integrating new features.
Many organizations have migrated their data warehouses to datalake solutions in recent years.
With the convergence of the data warehouse and the data lake, a new data management paradigm has emerged that combines the best of 2 approaches: the botton-up of big data and the top-down of a classic data warehouse.
In this talk, I’ll share how I began trading stocks and why I turned to Python to track my performance—along with the abundance of surprises that came with it. We’ll walk through the building blocks of two Python-powered apps: one that extracts stock transactions from screenshots, and another that generates summaries of my trading to uncover valuable insights
This talk explores the synergy between Apache Beam and Apache Airflow, demonstrating how to create a robust, end-to-end data engineering workflow. We'll dive into the challenges of orchestrating complex data processing tasks and show how combining Airflow's scheduling capabilities with Beam's data processing framework can create more efficient and manageable data pipelines. The session will cover integration with Google Cloud Platform services, including Cloud Functions, BigQuery, and Gemini AI models.
A case study of rewriting a simple data pipeline involving Python, a pinch of Go, Git workflows, Airflow, Postgres and Cloud. Investigating some common assumptions and principles of designing data pipelines.
The benefits and issues with the tools and how these may be handled.
I hope this case study of a pipeline rewrite will give you insights that are applicable to Python use for your own data pipelines, and into cloud pricing.
Maintaining Business Intelligent Tool (BI) governance, managing permissions, syncing documentation, and handling schema changes, can be chaotic. This talk explores how Python, Pydantic, and smart design patterns automate these tasks, ensuring seamless BI tool governance. Learn how to auto-sync table metadata, adjust queries on column renames, and enforce permissions effortlessly. With real-world examples, discover how to transform BI maintenance from a headache into a streamlined, automated process.
The "data build tool" (DBT) was designed to unlock software engineering best practices for SQL-based data pipelines: pipelines as version controlled directed acyclic graphs (DAGs) consisting of testable and reusable nodes. With the increasing number of cloud data warehouses and data lakehouses that allow the native execution of Python code, DBT also added support for Python models. In this talk, I will explain how Flatiron Health uses DBT and share our experiences with unit and data testing.
Do you ever find it complicated to learn the complexities of a traditional web framework to push your data science work online? Worry no more! Streamlit might help speed things up as it is designed for the required purpose - creating beautiful data-related web apps that can be deployed in minutes.
In the hands-on tutorial, we’ll go through various features of Streamlit and build a small lyric fetcher app based on the available curated dataset of around 24K Billboard top-100 songs.
We will dive into the fascinating world of football analytics, showcasing how to collect and process match data (e.g., Hudl Statsbomb, Sportmonks, and Understat), including player tracking, event logs, and tactical formations. Attendees will walk away with practical knowledge and Jupyter Notebooks, demonstrating Python's power in decoding modern football strategies.
Real-time data analytics is essential for powering modern applications like monitoring, personalization, search, and to some extend, RAG pipelines. However, building systems that can handle real-time ingestion, indexing, and retrieval at scale is no trivial task. This talk provides actionable insights into designing and maintaining such systems at scale using best practices.
In this talk, we introduce cluster-experiments, a Python library designed to facilitate end-to-end A/B testing workflows, including power analysis, experiment analysis, and variance reduction techniques.
Forecasting is a common activity that has clear business value in various domains but it is not a very common skill that Data Scientists have or feel confident about. In this crash course I will cover the fundamentals of Time Series forecasting from the basic methods to more advanced techniques. I will do this showcasing practical code examples using libraries from Nixtla.
Our mission is simple but profound: to improve and extend lives by learning from the experience of every person with cancer. This talk explains how we transform sensitive data from heterogeneous environments into research-grade datasets. And how we shift insights generation left to iterate faster.
Good retrieval performance is key to an effective RAG system, as it ensures relevant information is selected, directly impacting augmentation and generation quality. My presentation focuses on RAG indexing and retrieval, exploring methods to convert text into searchable formats, comparing techniques, and analyzing their advantages, disadvantages, and performance on an annotated dataset to enhance document retrieval based on user queries.
In today’s fast-paced machine learning environment, the ability to efficiently manage and reuse features across multiple models is crucial. This workshop explores how leveraging a feature store can streamline ML pipelines by ensuring consistency and accelerating deployment cycles.
Participants will gain hands-on experience with setting up, managing, and integrating feature stores into their existing workflows—transforming raw data into valuable, production-ready features.
A brag document is a powerful tool to highlight your work by making it visible, measurable, and demonstrating its real impact on you and your organisation - but such a document can be time-consuming to maintain. My talk explores automation of the writing process with language models fed with data from tools like Jira, Notion, and code commits. Learn how to save time, avoid registering missed achievements, and make your work stand out. Ideal for engineers at all levels looking to grow their impact.
Temporal is an open source, distributed, and scalable workflow orchestration platform designed to execute mission-critical business logic with resilience. Manage failures, network outages, flaky endpoints, long-running processes and more, ensuring your workflows never fail.
In data science, speed matters as much as accuracy, especially when users expect quick results. This talk explores simple yet effective techniques to boost performance and responsiveness on data-centric web apps based on practical experience working with Panel apps. While some strategies are case-specific, most apply broadly to data-driven projects.
Hear more about the evolving landscape of SQL transformation tools and data lineage challenges. Explore how sqlglot enables powerful SQL parsing and transformation capabilities, and see practical demonstrations of sqlmesh as a modern alternative to dbt. Learn about open-source approaches to data lineage tracking and discover how these tools are shaping the future of data engineering workflows.
This presentation examines approaches for detecting and eliminating near-duplicate images across datasets ranging from small collections to repositories containing millions of images. We will compare the performance of several embedding models, including CLIP, ResNet, and other variants, assessing their ability to capture semantic and perceptual similarity and performance tradeoffs. We will benchmark various vector database solutions on query speed and memory consumption.
We will explore the landscape of technical analysis libraries available for the Python language, including popular choices like TA-Lib (aka talib), Pandas TA, and Technical Analysis (aka bukosabino/ta) library.
An intro to the Dagster open-source orchestration tool.
Data Tool Stack.
What is Dagster, and who is it for?
What are its main use cases?
Testing the data and the code.
Deployment ideas to production.
Are you using Airflow or Pandas? Great! You've contributed to better data management at your organization.
The breakthrough of AI has reignited focus on high-quality data and effective data governance (not that scary as it sounds!) and management practices. AI needs fit-for-purpose data to reach its potential, and we already have powerful toolkit — like Airflow, Pandas, Matplotlib/Seaborn, or Great Expectations — to optimize workflows and ensure data quality.
A successful data scientist needs to have solid coding skills and stay up to date with the latest artificial intelligence and machine learning algorithms. However, there are many other skills and experiences that help you succeed in data science. In this talk Megan shares five of her most helpful career lessons she's learned in over eight years as a data scientist. These lessons will include tips on advocating for your own career development, how to collaborate with other teams and more.
Variable selection is often left up to an algorithm. However, controlling for some variables can improve measurement accuracy, and thus overall performance. On the other hand, certain "bad" controls can block pathways of relationships between variables that we want to preserve or create spurious correlations. Using real and simulated data, I explain when to reconsider your controls, and why that may significantly improve model accuracy.
Data management systems have gone through significant changes in the last 10 years, driven by user demands, novel techniques and improvements in hardware. These have far-reaching implications on how systems are deployed and used in practice.
In this talk, I will focus on three key aspects of modern data management systems: scalability, mutability, and interface. I will share my personal experiences, and will bring several examples from the database and data science worlds.
A
Multimodal AI is booming this year with models capable of seeing, reading, hearing. Models advancing in this field unlocks many production use cases in robotics, document AI, computer/web automations and more!
In this talk we will go through everything multimodal and open-source: a bit of background, libraries, very basic APIs to get you started with open-source models, popular open-source models, use cases (multimodal agents, multimodal RAG, automated browser use and more!)
Some of the latest big evolutionary steps in generative AI has been models that support function calling and “agentic” capabilities. This is provides generative models with “tools” that allow them to go beyond generating outputs for simple queries, and start planning the best way to solve complex queries. In this talk, we’ll be diving into using vector databases as the backbone for these types of complex AI architectures, both serving as knowledge bases, and memory.
Pydantic, Pancakes and Poter quick though deep dive into the world of response modeling
Discover how EGTL (Extract, Generate, Transfer, Load) extends traditional ETL by adding a “generate” step powered by GenAI. In this talk, I’ll demonstrate how Python pipelines on top of data warehouse can automatically extract data, generate new insights, and deliver optimized transformations. We’ll explore practical workflows, real-world use cases, and best practices—equipping you to apply EGTL in your own data projects.
In an era dominated by data, businesses struggle with processing diverse, unstructured information across systems. This research presents an AI-powered pipeline addressing product matching challenges in retail and e-commerce. Our solution combines traditional matching algorithms with deep learning through a five-step process. This approach minimizes manual intervention while improving accuracy and efficiency.
AI is evolving from passive tools to autonomous agents, driving the rise of agentic workflows that can plan, execute, and optimize tasks with minimal human input. This session will explore how large language models and multi-agent collaboration power these systems, enhancing efficiency and innovation. Attendees will gain insights into real-world applications, challenges, and the future of autonomous AI systems, uncovering how agentic workflows are transforming industries.
This talk charts the evolution of Artificial Intelligence through the dual lenses of data and models, tracing AI’s journey from early symbolic systems to today’s advanced data-driven techniques. Attendees will learn how the interplay of ever-growing datasets and increasingly sophisticated model architectures has powered major breakthroughs, transforming AI from theoretical curiosity to a global catalyst for innovation.
AI agents are transforming the way we create applications. However, developing multi-agent applications can often feel complex and time-consuming. LangFlow simplifies this process by offering an intuitive, easy-to-use interface for building AI-driven solutions.
What can go wrong with tokenizer encodings? Everything! I will share my experience of understanding, misunderstanding, and ultimately learning to work with tokenization in LLMs. I will discuss what surprisal is, its relevance to my research, and its connection to tokenization. The talk will include various examples illustrating how misunderstandings of tokenization can arise, as well as strategies for debugging and preventing these issues.
AI-driven code generation can transform software development in regulated sectors like banking and insurance - but only if implemented securely and responsibly. In this talk, we’ll explore how to harness tools like GitHub Copilot and ChatGPT to boost productivity while ensuring compliance. Attendees will learn key considerations, best practices, and practical insights to keep code generation both efficient and fully auditable.
Scoris.lt utilized Large Language Models (LLMs) to address the challenge of improving SEO performance by generating financial descriptions for Lithuanian companies. The case study highlights the innovative application of LLMs and custom translation models to create high-quality, multilingual content at scale.
How can machine learning enhance biomedical image analysis? This talk explores the potential of Python and PyTorch in automating artifact and damage segmentation. From data preprocessing to clustering-based label classification and deep learning-driven segmentation, key techniques will be discussed, including the use of Convolutional Neural Network architectures. The session will also cover performance evaluation and insights into advancing biomedical imaging with AI-driven solutions.
Unlock sensitive data potential with anonymization! Learn how Python, diffusion models, and Named Entity Recognition (NER) empower institutions to anonymize PII in financial documents, replacing it with synthetic stand-ins. Discover open-source, self-hosted tools to ensure privacy while unleashing data's full power.
What is the way from prove of concept to big production solutions for GenAI application? How to make it scalable and make one release by 7 sprints?
The Model Context Protocol (MCP) is an emerging standard that enables structured data provisioning for LLMs and AI agents. However, the current data discovery mechanism in MCP is static. This limits the AI’s ability to dynamically assess the utility, relevance, and efficiency of data tool calls in real time. Here I present an enhancement to MCP "tool discovery" that introduces dynamic data descriptions, allowing LLM to be better informed.
We built a cutting-edge speech-to-text model that outperformed solutions from industry leaders like Microsoft, Google, and OpenAI. This is the story of how we identified a market gap, defined what makes "the best model," and turned our vision into a successful business.
Fairness and safety are fundamental criteria for building trustworthy and high-quality AI systems, whether they are credit scoring models, hiring assistants, or healthcare chatbots. But what does it truly mean for an AI system to be fair and safe? In this talk, I will explore the potential risks and challenges associated with these principles and introduce various approaches and techniques for evaluating AI systems. The discussion will center on applications powered by Large Language Models (LLMs).
Financial transactions generate vast amounts of sequential data, yet traditional risk assessment models often rely on predefined features that may not capture the full complexity of user behavior. This talk explores how transaction embeddings—inspired by techniques from NLP and Computer Vision—can transform financial modeling.
The more users your platform attracts, the more unwanted attention you'll get from people looking to game your system. While every product is unique, the journey of tackling these bad actors tends to follow similar patterns across companies. In this talk, I'll walk you through the three stages of platform protection that I've witnessed firsthand, and how to level up your safety game using the data you have.
Modern work demands constant context-switching—emails, notes, meetings, and tasks pile up, leaving us overwhelmed. This talk introduces a slow productivity AI approach, inspired by Cal Newport, that leverages offline, open-source automation using Hugging Face, n8n, and Obsidian. By structuring knowledge into meaningful tasks without disrupting deep work, we can create a sustainable, low-distraction workflow—working smarter, not just faster.
⚡
2025 is positioned to be the year of Agentic Systems - AI agents capable of autonomous decision-making that are transforming the software landscape. In just a few months, we have already seen the technology transition through multiple hype cycles. In this talk, we’ll cut through the noise to explore the real challenges and emerging opportunities in the space. We will examine where we are, where we're headed, and what it all means for developers shaping the future of AI.