2024-06-15 –, E105 (capacity 70)
In today's fast-paced business environment, and especially with the advent of machine learning (ML), organizations are seeking ways to derive better insights from their data as quickly as possible. However, implementing a complete ML pipeline can be quite challenging. It’s even harder if you want to process newly arrived data immediately or you have a legacy system which is not easy to connect with your modern infrastructure . Change Data Capture (CDC) has emerged as a technology for delivering real-time data changes from various sources, especially from the databases. In this talk we will introduce Debezium, a leading open source framework for CDC. We will discuss how it can be leveraged for ingesting data from the various databases into ML frameworks like TensorFlow and what the pitfalls are if you go this route. We will also briefly discuss possible future improvements in this area, especially possible integration with emerging ML feature store technology.
The talk will be accompanied by a demo in which well-known example of recognizing handwritten digits using the TensorFlow model and images stored in a Postgres database will be shown. All in real-time.
Attendees will gain an understanding of how Debezium CDC works, how it can help them to ingest data from the source database into the ML framework in real time and also what are the possible challenges with this approach.
Vojtech is a software engineer at Red Hat, currently working as a core developer of Debezium change data capture framework. He is interested in distributed systems and related areas.