PyCon DE & PyData 2025

Introducing the Synthetic Data SDK - Privacy Preserving Synthetic Data for AI/ML
2025-04-23 , Helium3

AI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility.

In this Session, we'll cover the fundamental concepts of AI-generated synthetic data and demonstrate how easy it is to generate synthetic data within your local compute environment using the open-source Synthetic Data SDK.


Privacy regulations are tightening globally, making it increasingly challenging for organizations to access and share data while ensuring compliance.

AI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility.

MOSTLY AI has recently released an efficient and flexible Synthetic Data SDK under a fully permissive Apache v2 license, empowering anyone to generate high-quality synthetic data with top-tier performance. Powered by the TabularARGN model architecture, the SDK achieves training times 10x to 100x faster than existing models, while acchieving a SOTA fidelity-privacy balance.

In this Session, we'll cover the fundamental concepts of synthetic data and demonstrate how easy it is to generate synthetic data directly from a Jupyter Notebook using the Synthetic Data SDK. Specifically, we will go through
- Installing the Synthetic Data SDK
- Loading original data into the SDK and locally creating a Generator
- Using a Generator to create different versions of synthetic data
- Uploading a Generator to the MOSTLY AI Platform and sharing it with the world

This will be a hands-on session - so come with your laptop and ideally a dataset that you'd like to synthesize!


Expected audience expertise: Domain:

None

Expected audience expertise: Python:

Intermediate

Public link to supporting material, e.g. videos, Github, etc.:

https://github.com/mostly-ai/mostlyai

Michael co-founded MOSTLY AI in 2017, led the company as CEO until 2020, and then transitioned to the CTO role. Michael is a world class data scientist who held leading positions at Microsoft and Nokia before founding MOSTLY AI. He was awarded with the Global Marketing Research Award by the American Marketing Association. He holds a PhD degree from the Vienna University of Economics and Business and a Master degree from the Vienna University of Technology. Michael is a proud dad of two daughters and passionate for all kinds of sports including running, biking and baseball.