PyCon Lithuania 2024

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:30
09:30
240min
Data Processing with Apache Spark and Apache Iceberg
Tomas Peluritis

"Data Processing with Apache Spark and Apache Iceberg" is a dynamic workshop designed to equip data professionals with advanced skills in managing and processing large-scale data. Participants will be introduced to the essential table formats before delving into Apache Iceberg's integration with Apache Spark. This session focuses on practical applications, including schema evolution and efficient file management, to enhance data processing efficiency and scalability. Ideal for data engineers and scientists,

Data
Tutorials 1
09:30
240min
Introduction to Polars DataFrames - how to supercharge your data workflows
Marco Gorelli

Polars is the new dataframe on the block taking the world by storm.

You'll learn:
- what Polars is, and what it can do for you
- Polars basics and core concepts (including expressions and lazy computation)
- how to work with different datatypes, and how the List datatype gives you superpowers
- interoperability with other tools: NumPy, SciPy, Arrow, pandas, Numba
- migrating from pandas

What better way to learn it than by attending a PyCon Lithuania tutorial, delivered by a Polars core dev?

Data
Tutorials 2
14:00
14:00
240min
Aligning and Using an Open-Source LLM
Monika Venčkauskaitė

Large language models are very widely used today for various applications: one can detect fraud, analyze and converse with own documents or create a commercial chatbot.
This "Aligning and Using an Open-Source LLM" workshop is an intensive four-hour exploration designed to demystify the world of Language Models (LLMs) and open-source frameworks. In the rpidly evolving landscape of artificial intelligence, the effective use of LLMs has become a crucial skill for machine learning engineer.

Tutorials 1
09:00
09:00
25min
Opening

Introduction to the event and the day.

Room 111
09:30
09:30
60min
Keynote by Daniel Roy Greenfeld
Daniel Roy Greenfeld

My mother loves Šaltibarščiai.

Keynote
Room 111
11:00
11:00
30min
Django FTL: Resolving bottlenecks on the path to high performance.
Maxim Danilov

Raw Django doesn't take the first places when comparing the performance of Python web frameworks. However, it can be pretty fast if we identify the bottlenecks and find ways to avoid them. Comparing performance and implementation complexity before and after gives us an understanding of which features should be implemented and what can be skipped.

Web
Room 111
11:00
25min
Pythonic Insight: Navigating the Depths of Observability.
Francis Billa

In this presentation, we will be exploring Observability on a Python web application. We will delve into a real-world application, showcase, and discuss the importance of Obversing for Services. We will focus on the three foundations of Observability: Logs, Metrics, and Tracing.
Discover some tools for observing and monitoring, particularly showcase a Demo of how to integrate DataDog in a Python service. The presentation will show examples of logs and Metrics, and display how to trace a request.

Web
Room 203
11:00
25min
Your proposal: Data Harvest: Unlocking Insights with Web Scraping
Yuliia Barabash

In today's data-driven world, knowing how to gather and analyze information is more critical than ever. Join us for a compact session on using Python for crawling the web and solve real-time problems. We'll cover the basics, and then dive into a practical example of collecting data from the internet.

Web
Room 228
11:30
11:30
25min
Cracking the Code: Decoding Anti-Bot Systems!
Fabien Vauchelles

Join us for a presentation where we share the mysteries of anti-bot systems, guarding websites, APIs, and mobile applications ! 🌐📲

🛠️ What's in Store:

1/ Exploring the Defence Layers
2/ Anti-Bot Reputation Score Demystified
3/ Strategies for Evasion

After this talk, you'll emerge well-equipped with knowledge to navigate and comprehend the nuances of these protective measures! 🚀🔒

Web
Room 228
11:30
25min
Django + HTMX: Democratise Full Stack Web Development
Eimantas Nėjus

Enhance Django with HTMX: Elevate your web applications with seamless client-side interactivity and build dynamic, engaging experiences without page reloads - no React / Vue / Angular required!

Web
Room 111
11:30
25min
What are descriptors and why does Django need them?
Rodrigo Girão Serrão

Frameworks like Django use advanced Python features to provide devs with the magical tools they know and love.

In this live-coded talk we’ll take a look at a couple of Django snippets that use descriptors under the hood and we’ll use them as motivating examples for why Python needs descriptors.

By the end of the talk, you’ll understand how descriptors work and how they power Django behind the scenes.

Web
Room 203
12:00
12:00
25min
Building and Scaling an AI Startup with Async Django
Dr Piotr Gryko

Django's async capabilities and batteries-included tooling make it an ideal framework for quickly building MVPs and iterating. This talk demonstrates building a document search MVP with Django templates, ChromaDB, and hosted large language models. It then shows how to refactor and scale it using Elasticsearch, Celery/RabbitMQ workers, React, self-hosted vLLM, and auth. With Django async, you can rapidly build, constantly improve, and deploy the latest AI models in your product.

Web
Room 111
12:00
25min
How to Utilize Machine Learning for Better Web Scraping
Tadas Gedgaudas

Join Tadas Gedgaudas in an enlightening talk on revolutionizing web scraping with machine learning. Uncover how ChatGPT can adapt to website layout changes, making scraping more efficient and reducing maintenance needs. Delve into data structurization with ML, the seamless integration of ChatGPT for parsing, and its practical impact for developers.

Web
Room 228
12:00
25min
OpenSearch, Python, and Serverless for Modern Search Applications
Laysa Uchoa

Facing challenges with search capabilities in your web applications? Discover how the combination of OpenSearch, Python, and serverless architecture can be your solution. This talk provides hands-on examples, from building efficient queries to implementing production-ready practices. You'll gain actionable insights and the practical know-how to build and deploy robust, query-efficient search applications that solve real-world challenges.

Web
Room 203
13:30
13:30
120min
FastDjango: Conjuring Powerful APIs with the Sorcery of Django Ninja
Julius Boakye

Dive into the world of modern web development by fusing the power of Django and FastAPI. This talk will guide you through the process of building robust, scalable, and efficient APIs using Django Ninja, a web framework that combines Django's reliability and FastAPI's speed. We'll explore how to leverage Django's ORM and user authentication while enjoying FastAPI's performance and type checking. Whether you're a Django veteran looking to supercharge your APIs or a beginner eager to learn cutting-edge techniq

Web
Room 219
13:30
25min
Lessons (I Wish I Knew These Before) from Migrating a Farm of Django Projects from On-Premises to AWS with Kubernetes
Justinas Kuizinas

At Corner Case Technologies, we offer clients a service to migrate from on-premises infrastructure to AWS for various purposes, including high availability, cost optimization, and maintainability. Each migration is unique and necessitates thorough preparation for planning, execution, and subsequent development.

In this talk, I will present a specific use case of a migration that we conducted, with a particular focus on the lessons learned during the planning and execution phases.

Web
Room 111
13:30
25min
µDjango 2.0, an asynchronous microservices technique.
Maxim Danilov

A standard Django project involves working with multiple files and folders from the start. Let's see how the work with a Django project changes when we have only one file. This solution automatically transforms Django into a microservice-oriented async framework with "batteries included” philosophy.

Web
Room 203
14:00
14:00
25min
Python behind the scenes of Danske Bank's Cloud Migration at Scale
Romualdas

A glance behind the curtains into how an execution part is going with Danske Banks Lift and Shift journey to public cloud. Let's deep dive into some of the technical challenges and a snek Python stack standing right in front helping orchestrate Cloud Migration at Scale.

Web
Room 111
14:00
25min
Scaling, Refactoring and fixing a Django MVP for Production
Dr Piotr Gryko

So you've built an AI startup using Async DJango - the MVP looks great and your hand full of users love it. Now you need to clean up the MVP, so you can scale.
This is the Part Two, to building an AI startup with Async DJango - we talk about moving from ChromaDb to a OpenSearch/ElasticSearch, document processing steps to Celery/RabbitMQ, selfhosting via vLLM, migrating from Django templates to a ReactJs APP, better monitoring and logging

Web
Room 203
14:30
14:30
25min
How we Develop and Maintain a Modern Python Service at Mozilla: Merino as Example
Tadas Korris

At Mozilla, we maintain services that are used by millions of users daily. These services are the backbone of expanding Firefox and providing users with useful features, all while protecting privacy.

Learn about how one service, Merino, was planned to meet user needs at scale. This service providers users with search recommendations and suggestions from local and remote providers. Get some insights on how we develop, deploy and monitor and maintain this modern Python service.

Web
Room 203
14:30
25min
Mastering Web Scraping: Unleash Your Data Extraction Wizardry!
Fabien Vauchelles

Unlock the full potential of web scraping with this session! From novice to virtuoso, join us on an exciting journey of data extraction as we unravel secrets and advanced techniques.

🔍 Session Highlights:
1/ Building Web Scrapers - The Art Unveiled 🛠️
2/ Proxy and Browser Farms Adventure 🌐
3/ Scrapoxy Orchestration - Elevate Your Scalability 🚀
4/ Protection Measures Disclosed 🔒

This concise session will immerse you in the fascinating world of web scraping.

Web
Room 228
14:30
25min
Saying bye to the Keyboard, Hello to Alexa with Python AWS Lambda
Laysa Uchoa, Yuliia Barabash

Join us, and discover how Alexa's ability to recognize and convert speech into text can be used to create applications that break the monotony of your daily routine without the need to use a keyboard at all. We will teach you about the main components of Alexa, how to get started with the Developer console, and how to customize Alexa using our favorite language, Python in a serverless way. We will also demonstrate how to incorporate Alexa into your daily developer life, and you might find that, after this t

Web
Room 111
15:30
15:30
60min
Encode OSS: Funding open source development
Tom

Being an Open Source citizen. I'll be talking about the motivation behind Encode OSS. How we can work towards properly funding open source development, why that's valuable, and what we've been working on lately.

Keynote
Room 111
16:30
16:30
120min
Reception

Reception drinks.

Org
Room 111
09:30
09:30
60min
Dancing with Design
Robert Smallshire

Look at your system's design! Are the major structures and technology choices the result of conscious decisions, or have they emerged as the system has evolved? Is the design stuck in a local minima while ever more features are piled into the system? How can we design systems which withstand the major forces acting on a solution?

We’ll see why system designers should focus deliberately on the constraints and qualities of system design, and avoid getting too distracted by features.

Keynote
Room 111
11:00
11:00
25min
Is Mojo just a hype?
Maxim Zaks

In May 2023, there was a big buzz in the AI community as a brand-new programming language called 'Mojo' made its debut. People were talking about it in blog posts like: 'Mojo may be the biggest programming language advance in decades'.
In this talk, we'll dive into Mojo, checking out what it promises and where it stands right now, and also pondering what the future could hold for it.

Target Audience: Software Developers
Prerequisites: General knowledge about programming languages

Python
Room 111
11:00
55min
Let’s create a Python Debugger together
Johannes Bechberger

Debuggers are indispensable tools for all Python developers, empowering them to conquer bugs and unravel complex systems. Let's create our own.

Python
Room 203
11:00
55min
Python package creation using bleeding edge toolset
Albertas Gimbutas

We will create a new Python package from scratch using the best practices and will deploy it to pypi.org. We will also learn the benefits and how to use the bleeding edge tools for code linting, unit testing and deployment. Let's make Python ecosystem even more awesome!

Python
Room 218
11:00
25min
analyzing stdf production test data in the silicon manufacturing industry using construct
Franz Haas

The data amount and the complexity of the queries are not particularly large in this industry. The challenge comes from using the STDF format, a binary file format with roots in the 1980's.

A method to make this data source available to modern data analysis tools (jupyter/streamlit) using the construct library will be discussed. The focus is on how the data can be collected, converted and made available in a fast and efficient way, using both pypy and cpython.

Python
Room 228
11:30
11:30
25min
Unleashing Python's potential with MAX Platform
Antanas Daujotis

The speech will address Python's limitations in AI and how MAX Platform can overcome them by offering superior speed, seamless Python code execution, and hardware compatibility. It will inspire Pythonistas to explore MAX Platform and unlock new possibilities in AI development and beyond.

Python
Room 111
11:30
25min
Watsonx: A GenAI platform that's built for business
Robert Dzisevič

The hype for GenAI keeps rising. Nowadays, almost every company wants to adopt this technology in their business, but in order to successfully deliver a GenAI project, it takes much more than just figuring out, what to ask ChatGPT. During the presentation, I'll introduce you to an AI platform, that allows users to deliver GenAI projects with confidence.

Python
Room 228
12:00
12:00
25min
Designing for tomorrow's programming workflows
Matthew Honnibal

New tools are changing how people program, and even who programs. Type hints, modern editor support and, more recently, AI-powered tools like GitHub Copilot and ChatGPT are truly transforming our workflows and improving developer productivity. But what does this mean for how we should be writing and designing our APIs and libraries?

Python
Room 111
12:00
25min
Grokking Event-Driven Web App with Python
Tung Hoang

Crafting scalable event-driven applications using Python can be a tricky endeavor, requiring careful consideration of various factors, from understanding synchronous and asynchronous network calls to tackling the Python Global Interpreter Lock (GIL) bottleneck and implementing robust auto-scaling strategies. This talk delves into advanced techniques and concepts for designing and implementing scalable event-driven applications with Python, empowering you to overcome these challenges effectively.

Python
Room 228
12:00
25min
Lessons Learned From Maintaining SDK in Python for Three Years
Adam Furmanek

Let’s see how to build an SDK that works for years and is used by other developers. We’ll learn which patterns actually work, how mistakes made in the early stage affect the software years later, and how to make sure we don’t break users’ code when introducing changes.

Python
Room 203
12:25
12:25
95min
Lunch

Lunch

Org
Room 111
14:00
14:00
25min
Deadcode - a tool to find and fix dead (unused) Python code
Albertas Gimbutas

A newly developed deadcode Python package to detect and automatically fix unused Python code will be introduced. Real-world scenarios, when the deadcode saves development time will be provided. The main features and options of the deadcode package will be presented and it will be shown, why this tool is superior to vulture. Also some implementation details and complexities will be discussed.

Python
Room 228
14:00
45min
Object Oriented Programing the way it should be
Laimonas Sutkus

While Functional Programming gains traction, I'll showcase how OOP, done right, yields clean, efficient code. Explore a fresh perspective, gain insights, and reshape your coding approach.

Python
Room 203
14:00
55min
Python mokytojai

.

Room 218
14:00
25min
The role of Rust, Zig and C++ in the Python ecosystem
Cristián Maureira-Fredes

Python's ecosystem is one of the best out there, and this is mainly due to its community and what lies inside its core, a C API.

Being partially in C enables Python to interact with many languages out there which might be known by you like C++, Rust or Zig. But how does it work?

On this talk, you will be able to understand how Python can embrace the power and performance of other languages, in order to expose modules that improve the whole ecosystem.

Python
Room 111
14:30
14:30
25min
Kill All Mutants! (Intro to Mutation Testing)
Dave Aronson

Agenda:
- What is mutation testing?
- Why isn't test coverage enough?
- What are its pros and cons?
- How does it work (overview and details)?
- Simple example (finding and fixing bad test)
- Complex example (finding and fixing bad/missing test)
- Complex example (finding and fixing redundant code)
- FAQs -- history, why it's so CPU/RAM intensive, and more if time allows
- Unusual applications, if time allows
- Wrapup
- Q&A

Python
Room 228
14:30
25min
Using Rust & PyO3 to make Pydantic v2 even faster
David Hewitt

In this talk we'll review some of the changes we've made to Pydantic since 2.0 to push performance even further. This is possible largely because Pydantic chose to implement the core in Rust. We'll focus on two main topics:

  • Come learn about optimizations Pydantic has been working on since 2.0
  • Come see our draft ideas how Pydantic v3 could be even faster than v2

You should leave this talk excited about performance wins for your apps using Pydantic and inspired to try Rust in your own code.

Python
Room 111
15:00
15:00
30min
503 days working full-time on FOSS: lessons learned
Rodrigo Girão Serrão

I've been working full-time on a Python FOSS project for 503 days, so what did I learn?

Am I a better (Python) programmer?
Better teammate?
Better person?

In this talk I will share some lessons I learned over the course of these 503 days:

  • how to get a tech job in this day & age
  • how to put your ego aside
  • how to deal with mistakes
  • how to interact with users & contributors online
  • how it feels to collaborate to a large codebase

As for the first 3 questions... Ask my colleagues!

Python
Room 228
15:00
25min
Building Open Climate Change Information Services in Python
Trevor James Smith

Performing climate science within the context of climate change requires creative solutions to challenges such as data collection and storage management, optimizations for better memory and CPU usage, in addition to ensuring that analysis outputs are trustworthy. This talk will showcase xclim and finch, two pieces of software built for performing climate analyses on large datasets using Python, WPS, and the PANGEO software stack of technologies.

Python
Room 203
15:00
25min
Pointers? In My Python?
Eli Holderness

Learn about Python's memory handling, including:
- what pointers are, and why it matters
- what object IDs are, and what they mean
- how CPython can tell when you're done with an object, and what happens next

No C knowledge required!

Python
Room 111
15:00
45min
Simplifying large Python projects by distributing complexity.
Maxim Danilov

An overcomplicated project increases development and maintenance time.
If a complete redesign is not possible, we can distribute the complexity across the existing codebase.
If AI assistants cannot help us with this task yet, we should discuss manual methods and tools that can be useful.
Using examples of real large projects, we will discuss that despite different business types, geographical and social contexts, these projects share similar architectural mistakes and how they can be redesigned.

Python
Room 218
15:30
15:30
25min
A 101 in time series analytics with Apache Arrow, Pandas and Parquet
Zoe Steinkamp

Columnar databases are on the rise! They provide an efficient and scalable data warehouse for many use cases including time series data. The problem? Many conventional database drivers and querying methods become the bottleneck for data processing and analytics within our client-side applications. Learn how to leverage open-source projects like Apache Arrow Flight and Apache Parquet alongside industry-standard analytics tools to build the foundations of a performant analytics application.

Data
Room 228
15:30
25min
Deep Dive into Asynchronous SQLAlchemy - Transactions and Connections
Damian Wysocki

SQLAlchemy is one of the most popular ORM libraries in Python. In this talk I will try to present caveats and gotchas that other Pythonists can find on their way while writing the asynchronous backend application using SQLAlchemy as an ORM. Mainly we will focus on how SQLAlchemy handles transactions and connections to the database and what issues we may face because of it.

Python
Room 203
15:30
25min
The Ghosts of Distant Objects
Ben Clifford

Sometimes you have a Python object and you want it somewhere else: maybe you want to save your data to disk and load it again tomorrow; or you want to send some complex parameters over the network.
I'll talk about pickle - the usual way to do this, including ways it can go wrong, how to extend it, compare it to other approaches like JSON or storing in a database; and I'll stick a little bit of theory in my talk too.

Python
Room 111
16:30
16:30
60min
LLMs: when to use them and when to avoid them
Arjan Egges

At ArjanCodes, we use LLMs in various ways. They are part of the content we produce, we use them in platforms we develop, such as Learntail, they are integrated in automations that streamline our internal processes, and they’re part of our personal workflows, whether that’s for sales and marketing, operations, or software development.

In this talk, I’ll go over all of these use cases and share the things that we learned from working with LLMs and where LLMs provide us with the most value. Hopefully this wi

Keynote
Room 111
17:30
17:30
180min
Wargaming Quiz

.

Org
Room 111
09:30
09:30
60min
Keynote Polars
Ritchie Vink

Polars is an OLAP query engine that focusses on the DataFrame use case. Machines have changed a lot in the last decade and Polars is a query engine that is written from scratch in Rust to benefit from the modern hardware.

Effective parallelism, cache efficient data structures and algorithms are ingrained in its design. This talk will go through recent changes and plans of the project.

Keynote
Room 111
11:00
11:00
25min
DataFrame interoperatiblity - what's been achieved, and what comes next?
Marco Gorelli

In 2023, we saw several libraries - which had previously only supported pandas - add support for other dataframe libraries such as Polars, Modin, and cuDF.

  • How did they do it?
  • Are there any drawbacks to how they did it?
  • What comes next, and what other solutions are there?

This talk could be of interest to anyone working with dataframes. In particular, those maintaining or contributing to libraries which use dataframes will learn about how they can best support multiple dataframe libraries.

Data
Room 111
11:00
25min
ML Model Serialization: Improving Efficiency and Flexibility
Jonas Jarutis

Machine learning (ML) model serialization helps to optimize inference latency, memory, and disk space requirements and provides more options for model deployment. We will explore the use cases that benefit the most from this technique and some drawbacks.

Data
Room 228
11:00
25min
The pragmatic Pythonic data engineer
Robson Junior

Learn to make practical decisions in data engineering with Python's vast ecosystem. Avoid blindly following market guidelines and consider the reality of your situation for better performance and architecture

Data
Room 203
11:30
11:30
25min
Transcend the Knowledge Barriers in RAG: Setup, Chat State, and More
Isaac Chung

Developer tools power many LLM-based chat and Retrieval Augmented Generation applications today. However, there is a non-trivial knowledge barrier for entrants that could hinder developer experience. Our discussion intends to offer actionable insights into building and maintaining generative AI solutions in a secure and economical way, thereby improving the developer experience in this Generative AI wave.

Data
Room 228
11:30
25min
Write-Audit-Publish Pattern in Modern Data Pipelines
Tomas Peluritis

Data is new oil, and one of the ways is leakage and poisoning the surrounding environment. What happens if you pollute one of the datasets used in some decision makers facing dashboards? In this talk, I will explain the reemergence of the Write-Audit-Publish pattern and how you can achieve it using Apache Iceberg and Apache Spark.

Data
Room 203
11:30
30min
functime: a next generation ML forecasting library powered by Polars
Luca Baggi

Polars conquered dataframes, and now it is coming for machine learning! With Polars-powered feature-extraction and a best-of-the-class set of diagnostic tools, functime enables forecasting thousands of time series all at once, from the comfort of your laptop.

Though forecasting practitioners are the intended audience, the talk has something for every data scientist. With Polars, we can push the boundary for what "reasonable scale" means - and build a new generation of tools for machine learning.

Data
Room 111
12:00
12:00
25min
Generative AI in Lithuanian language
Vytautas Bielinskas

Presentation about how we (few local NLP enthusiasts) trained Language Transformer to generate meaningful text in Lithuanian language. Everything was based on volunteer work with huge R&D flavor.
During this presentation I will not only cover what kind of data we used to train this model and what results we got but also present other initiatives we drive in NLP field. Will try to do both technical and interactive presentation.

Data
Room 228
12:00
25min
Revenue based scoring in `GridSearchCV`: a case for the new metadata routing in scikit-learn
Adrin Jalali

Passing metadata such as sample_weight and groups through a scikit-learn cross_validate, GridSearchCV, or a Pipeline to the right estimators, scorers, and CV splitters has been either cumbersome, hacky, or impossible.

The new metadata routing mechanism in scikit-learn enables you to pass metadata through these objects. As a use-case, we study how you can implement a revenue sensitive scoring while doing a hyperparameter search within a GridSearchCV object.

Data
Room 203
12:00
25min
Streaming DataFrames: A New Way to Process Streaming Data in Python
Tomáš Neubauer

Introducing an open source library in Python: Quix Streams. It solves all the complexities of stream processing in a cloud native package with a familiar Pandas DataFrame API interface. This library lets you work with data like they are static in your Jupyter Notebook without any hassle associated with streaming technologies. Our mission is to bring masses of Python developers into streaming and make the journey as smooth as possible so real-time applications using ML are not so difficult

Data
Room 111
14:00
14:00
55min
Coding a vector database from scratch
Aurélien Massiot

In 2023, vector databases are attracting great interest, as evidenced by the Google Trends search statistics. This type of database has a direct link with Large Language Models (LLM), such as ChatGPT , by enabling “Retrieval Augmented Generation” (RAG) for example. This approach offers the possibility of exploiting the power of a conversational agent using our own data.

But... Do you really need a vector database ?

Data
Room 228
14:00
25min
[MLOps] CI/CD in the age of Machine Learning
Emmanuel-Lin Toulemonde

Machine learning models are a new artifact to build, version and deploy, explore there impacts on your architecture.

Data
Room 203
14:00
25min
🧼 From GPU-poor to data-rich: data quality practices for LLM fine-tuning
Gabriel Martín Blázquez, David Berenstein

If you are GPU-poor you need to become data-rich. I will give an overview of what we learned from looking at Alpaca, LIMA, Dolly, UltraFeedback and Zephyr and how we applied that to fine-tuning a state-of-the-art open source LLM called Notus and Notux by becoming data-rich.

Data
Room 111
14:30
14:30
25min
Customizing LLMs: A Guide to Fine-Tuning Open Source Models
Maria Jose Molina Contreras

In today's world, large language models (LLMs) are revolutionizing how we interact with technology, allowing us to have conversations, organize data, write text with minimal human effort.However, It is likely that when using an LLM, you have received incorrect answers or not specialized answers.
For this reason, fine-tuning models that have been pre-trained with this large corpus of data is crucial to: (1) obtain better performance in the quality of responses, and (2) tune the model to a specific domain.

Data
Room 111
15:00
15:00
25min
Data Version Control Done Right with Python and Unity
Nir Ozeri, Nir Ozeri

Python is a leading language of choice for the Databricks and ML ecosystem, alongside a delta tables stack leveraging Unity catalog to manage petabytes of structured data. To build and experiment with ML data and models, version control has become the backbone of modern machine learning (ML) projects, bringing critical aspects of reproducibility and experimentation to teams who are able to experiment in isolation, while still collaborating on projects.

Data
Room 203
15:00
25min
Speed up open source LLM-serving with llama-cpp-python
Isaac Chung

Large language models (LLMs) often require huge compute resources to serve. This is a common challenge for those who want to avoid sharing their data with cloud API providers, or to deploy their stack in air-gapped environments. We will take a look at how the open source llama-cpp-python library opens the door to lower hardware requirements and simplifies deployment significantly.

Data
Room 111
15:00
55min
Transforming Data Insights: Creating Dynamic Animated Stories with Python and ipyvizzu-story
Peter Vidos

Unlocking the value of data often hinges on the ability to communicate insights effectively to non-technical audiences. What if you could go beyond static charts and captivate your audience with animated data stories? Join us in this workshop to discover the power of animated storytelling using ipyvizzu-story, an innovative open-source presentation tool designed to work seamlessly within Jupyter Notebook and similar platforms.

Data
Room 228
15:30
15:30
25min
Making an e-shop search bar your friend with Pinecone's hybrid search
Martynas Venckus

Fast and accurate search results are a crucial components of any e-shop and thus can make the difference between high user satisfaction and user frustration. With recent advancements in vector search technologies, enhanced search systems have become more efficient, leading to better user experiences and improved conversion rates. In this talk, we’ll explore how to implement a hybrid search system for a non-english e-commerce site using Pinecone, a high-performance vector search engine.

Data
Room 203
15:30
25min
RAG on KDTree
Jan Bartnitsky

How to use KDTree from sklearn library to prototype RAG (Retrieval-Augmented Generation) applications.

Data
Room 111
16:00
16:00
15min
Lightning talks
Room 111
16:30
16:30
60min
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
Ines Montani

With the latest advancements in Natural Language Processing and Large Language Models (LLMs), and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies? I don’t think so, and in this talk, I’ll show you why.

Keynote
Room 111
No sessions on Saturday, April 6, 2024.