PyCon LT 2023

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.

Thursday, May 18, 2023

Friday, May 19, 2023

09:30

60min

Garbage in -> Pydantic -> you're golden!

Samuel Colvin

Pydantic is a data validation library for Python that has seen massive adoption over the last few years - it's used by major datascience and ML libraries like Spacy, Huggingface and jinja-ai - overall Pydantic is downloaded over 55m times a month!

In this talk Samuel Colvin, the creator of Pydantic will cover two subjects which have seen massive interest in recent years:

How Pydantic can be used to prepare data for machine learning thereby saving time and avoiding errors
The emergence of Rust as the go-to language for high performance python libraries - how this might go in the future, and the benefits and drawbacks of the trend

Keynote

Saphire ABC Main

11:00

55min

Analyze your data at the speed of light with Polars and Kedro

Juan Luis Cano Rodríguez

Writing maintainable data science code is a big topic, and different people have different opinions on the best ways to do it. Wouldn't it be nice if there was an opinionated framework to set some structure and help data scientists be more effective and ship their analysis and models to production faster?

In this workshop we present Kedro, an opinionated Python framework for creating reproducible, maintainable and modular data science code. We will also show how you can combine it with Polars, a new dataframe library backed by Arrow and Rust, for lightning fast data manipulation and exploratory data analysis.

Building Hexagonal Python Services

Shahriyar Rzayev

The importance of enterprise architecture patterns is all well-known and applicable to varied types of tasks. Thinking about the architecture from the beginning of the journey is crucial to have a maintainable, therefore testable, and flexible code base. In We are going to explore the Ports and Adapters(Hexagonal) pattern by showing a simple web app using Repository, Unit of Work, and Services(Use Cases) patterns tied together with Dependency Injection. All those patterns are quite famous in other languages but they are relatively new for the Python ecosystem, which is a crucial missing part.

Code More, Draw Less: Auto-Generate Software Architecture Visualizations ft. Graph DBs, pandas & Python

Deleted User, Kang Min Bae

Understanding software architecture and how the data flows within software components is a vital step toward building and maintaining software systems. Architecture diagrams help enable this through digital graphical designs mixed with human-computer interaction. Furthermore, these visualizations not only help system architects, but also developers, project managers, and even customers. The complexity in designing them arises not only from the fact that such systems are an intangible conceptual entity, but also, most importantly, that they are ever-evolving.

While we are searching for life on Mars, our software diagrams remain manual and lifeless. Imagine a life where you update the code for your software, and the architecture view gets updated automatically and is ready to be interacted with. Let's use Graph Databases, pandas, and Python to add life to them and make them interactive.

How to Build a Data Science Portfolio That Will Make Recruiters Swipe Right

Karolina Griciunė

Building a strong data science portfolio can be a daunting task, especially for those just starting out in the field.
In this talk, we will explore the essential elements of a successful data science portfolio, and provide actionable advice for building a portfolio that will make recruiters take notice.

First, we will discuss the importance of selecting the right projects for your portfolio. We'll share tips for identifying projects that demonstrate a range of skills and showcase your expertise in a specific area.

Next, we'll talk about how to present your work in a clear and compelling way, including how to structure your portfolio and which tools and platforms to use.

We will also discuss how to incorporate feedback from peers and mentors, as well as how to solicit feedback from potential employers. In addition, we'll cover best practices for maintaining and updating your portfolio, and how to use your portfolio to continue learning and growing in the field of data science.

PyData

Saphire B - PyData

11:05

230min

OPEN SPACE

Malachite B

11:10

25min

Uncle Data session 1

Uncle Data, Samuel Colvin

Uncle Data

Malachite A

11:30

25min

Domain Driven Design Meets Infractucture from Code: An AWS Credentials Management Case Study

Barbara Toporowska

Domain Driven Design (DDD) and Infrastructure from Code (IfC) are two powerful approaches to building software. DDD helps developers create flexible, scalable applications and with IfC they can be seamlessly deployed to the cloud. By combining these two approaches, we can create a layered architecture where IfC is just another layer in a DDD app.
To illustrate how we can achieve this, I’ll show an example of an app I developed using DDD principles. To make it work with IfC, I needed to add a configuration layer and use a special Python syntax for the service layer, which enables the IfC engine to compile it. The other layers don't even know that they're running in the cloud, which makes it easy to maintain the application and add new features. This talk will provide insights into how you can leverage the power of DDD and IfC to create robust, scalable, and flexible software applications, and how to incorporate IfC as another layer in your DDD architecture.

How we predict purchases in mobile games

Dima Savostyanov

More than 5 million people play Nordcurrent mobile games every month. The specificity of free-to-play games is that less than 10% of players make purchases. It is essential to retain paying players and keep them engaged as long as possible. To do that, we built a purchase prediction model.

We store data and make the most of feature engineering in Clickhouse. Apache Airflow orchestrates pipelines. Usually, we use CatBoost for Machine Learning. Pydantic and ClearML, on top of AWS S3, manage model files, training metrics, and configs. The quality in production is evaluated using dashboards in Apache Superset.

The architecture allows us to build fully reproducible ML pipelines. The learning process can be horizontally scaled to select the optimal hyperparameters. At the inference stage, you do not need to worry that the model was trained in some Jupiter Notebook, and it is unclear what to do if it suddenly breaks in a month.

PyData

Saphire B - PyData

12:00

25min

H2O Wave - Build web apps with nothing but Python

Martin Turóci

In the current age of AI, the ability to rapidly develop and deploy applications has become crucial for staying competitive. As the demand for AI-powered solutions continues to grow, it's more important than ever to be able to bring new ideas to market quickly and efficiently. The H2O Wave framework is a powerful tool that enables DS/ML people with the required business knowledge to do just that, without the unnecessary overhead of having a software engineering team in the middle.

This talk will introduce H2O Wave, a Python framework that allows developers to build web applications with minimal web development knowledge. With its high-quality UI widgets, built-in authentication, and developer tooling such as IDE extensions, H2O Wave simplifies the app development process and helps teams bring their AI-powered applications to market faster. Attendees will learn how the framework is already being used by Kaggle Grandmasters to build AI applications and how it can help their own development efforts.

pandas 2.0 and the Arrow revolution

Marc Garcia

pandas 2.0 has recently been released, and one of the key features is a greater support of the Apache Arrow in-memory format. While the change is somehow internal, it opens a wide range of possibilities. In this talk we will have a quick overview of pandas and Apache Arrow, what is new in pandas 2.0, how users will be able to benefit from using pandas with Apache Arrow and what to expect from future pandas releases.

12:15

Uncle Data, Justinas Kuizinas

Uncle Data session 2

Malachite A