2025-03-02 –, F223
The batch mechanism is challenging when handling continuous data migration with DataProc. However, I'm introducing a new approach for continuous data pipelines enabled by PySpark. The participants will learn new methods to handle data consistency and reserve data completeness in a million-scale migration from SQL database into NoSQL, MongoDB.
In this talk, I'll present the challenging journey in the real world from my real-world use cases to migrate millions of rows of data from SQL database into NoSQL, MongoDB.
The talks composes of:
- Business context and technical challenge of million rows data migration.
- Data Pipeline Architecture -> SQL Server, GCP DataProc, GCP BigQuery, PySpark, and MongoDB Atlas.
- Suggesting approach for handle million-row migration for SQL to NoSQL MongoDB
Intermediate
Category:Data Science/Analysis/Engineering
Piti Champeethong has over 20 years of experience working with databases. Currently, Piti serves as a Senior Consulting Engineer at MongoDB Singapore. He has spoken at several conferences, including the Global AI Conference 2023 and JavaScript Bangkok 2.0.0. In addition, Piti leads the MongoDB User Group Thailand, a thriving community of 3,000 developers.
Piti is also a Microsoft MVP (DevOps and Python), recognized for his expertise and valuable contributions to the developer community.