2025-10-01 –, Louis Armand 1 - Est
In the world of data, inconsistencies or inaccuracies often presents a major challenge to extract valuable insights. Yet the number of robust tools and practices to address those issues remain limited. Particularly, the practice of TDD remains quite difficult in data science, while it is a standard among classic software development, also because of poorly adapted tools and frameworks.
To address this issue we released Pelage
, an open-source Python package to facilitate data exploration and testing, which relies on Polars
intuitive syntax and speed. Pelage
empowers data scientists and analysts to facilitate data transformation, enhance data quality and improve code clarity.
We will demonstrate, in a test-first approach, how you can use this library in a meaningful data science workflow to gain greater confidence for your data transformations.
See website: https://alixtc.github.io/pelage/
Outline of the presentation:
• For juniors, a small demonstration showing the transition from Pandas to Polars.
• Using pelage makes data preparation and testing more efficient and effective.
• Interactive session with live coding, showcasing the ease of use and practical applications for data practitioners.
• Open discussion on the potential of Pelage for enhancing collaboration between data analysis, data engineering, and SQL workflows.
Key points:
• The first open-source package focused on data testing for the Polars API.
• Ease of integration: Utilizes an intuitive syntax leveraging Polars speed
• Dbt-inspired tests (but not only), that favors SQL transition compatibility
Target audience:
• Data scientist/analysts using Pandas or Polars
• Data practitioners who encounter data quality issues.
• Data analyst/scientists/engineers interested in bridging the gaps between Python and data pipelines written with SQL.
I work as a Data Scientist at Renault Digital. My missions encompass:
- Maintaining our MLOPS pipeline
- Co-animating weekly best-practices sessions for data scientists (40+ attendees)
- Acting as DevOps relay inside the data science Team
- Working with plants, industry and manufacturing plants to reduce their cost of operation
Previously, I worked during 6 years in Neuroscience research, in various public institutions.