Code Cleanup: A Data Scientist's Guide to Sparkling Code
2023-04-19 , Kuppelsaal

Does your production code look like it’s been copied from Untitled12.ipynb? Are your engineers complaining about the code but you can’t find the time to work on improving the code base? This talk will go through some of the basics of clean coding and how to best implement them in a data science team.


Data scientists often have a different background and priorities than software engineers. A lot of the code Data Scientists write never makes it to production, and as a result, the code might not always meet the same standards as production-ready code in a developer team. While it makes sense to have rather lax requirements on code for one-off analyses, this can lead to difficulties in maintaining production code and collaborating on projects with software engineers. Since production code is not (always) the main output of a data science team, it can also be hard to prioritize code quality.

In this presentation, we will go over some of the main principles of clean code and talk about practical steps that data science teams can take to improve their code. We will specifically focus on strategies that teams can implement to slowly and steadily improve the existing code base. This talk is aimed at data scientists who may not have a strong background in software engineering, but are interested in improving code quality and collaborating more effectively with software engineering teams.


Abstract as a tweet

Does your production code look like it’s been copied from Untitled12.ipynb? Are your engineers complaining about the code but nobody got time to clean things up? Check out this talk to learn some of the basics of clean coding and how to implement them in a data science team.

Expected audience expertise: Python

Intermediate

Expected audience expertise: Domain

Intermediate

Corrie Bartelheimer first became interested in data when studying topological data analysis during her math Masters. After working a few years in Berlin and organizing the Berlin Bayesian meetups for a while, she moved to Brussels, Belgium where she now works as a Data Scientist in the hospitality industry. Her interests include, among others, Bayesian modelling, network analysis, data visualization and best practices for data science teams.
In her freetime, she enjoys cooking for friends and sampling new Belgium beers.